|
Model parameters: d_model 768 ffw_size 3072 kv_size 64 n_heads 12 n_layers 15 |
|
Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 15 --hidden-size 768 --num-attention-heads 12 --kv-channels 64 --ffn-hidden-size 3072 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 84_762_549 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-146m174b100m --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 84_762_549 --lr-warmup-samples 847_625 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 100 --save-interval 10000 --eval-interval 10000 --eval-iters 1 --tensorboard-dir tensorboard_146m174b100m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_146m174b100m --load checkpoints_146m174b100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3319491.json --zero-stage 0 |
|
START 3319491: Fri 17 Mar 2023 01:50:53 PM EET |
|
0: |
|
0: |
|
0: ======================= ROCm System Management Interface ======================= |
|
0: ================================= Concise Info ================================= |
|
0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
0: 0 46.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
0: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
0: 2 40.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
0: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
0: 4 45.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
0: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
0: 6 38.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
0: 7 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
0: ================================================================================ |
|
0: ============================= End of ROCm SMI Log ============================== |
|
7: |
|
7: |
|
7: ======================= ROCm System Management Interface ======================= |
|
7: ================================= Concise Info ================================= |
|
7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
7: 0 45.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
7: 1 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
7: 2 38.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
7: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
7: 4 43.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
7: 5 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
7: 6 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
7: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
7: ================================================================================ |
|
7: ============================= End of ROCm SMI Log ============================== |
|
1: |
|
1: |
|
1: ======================= ROCm System Management Interface ======================= |
|
1: ================================= Concise Info ================================= |
|
1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
1: 0 45.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
1: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
1: 2 42.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
1: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
1: 4 49.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
1: 5 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
1: 6 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
1: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
1: ================================================================================ |
|
1: ============================= End of ROCm SMI Log ============================== |
|
4: |
|
4: |
|
4: ======================= ROCm System Management Interface ======================= |
|
4: ================================= Concise Info ================================= |
|
4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
4: 0 49.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
4: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
4: 2 41.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
4: 3 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
4: 4 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
4: 5 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
4: 6 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
4: 7 39.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
4: ================================================================================ |
|
4: ============================= End of ROCm SMI Log ============================== |
|
5: |
|
5: |
|
5: ======================= ROCm System Management Interface ======================= |
|
5: ================================= Concise Info ================================= |
|
5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
5: 0 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
5: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
5: 2 42.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
5: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
5: 4 44.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
5: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
5: 6 35.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
5: 7 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
5: ================================================================================ |
|
5: ============================= End of ROCm SMI Log ============================== |
|
3: |
|
3: |
|
3: ======================= ROCm System Management Interface ======================= |
|
3: ================================= Concise Info ================================= |
|
3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
3: 0 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
3: 1 52.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
3: 2 46.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
3: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
3: 4 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
3: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
3: 6 47.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
3: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
3: ================================================================================ |
|
3: ============================= End of ROCm SMI Log ============================== |
|
2: |
|
2: |
|
2: ======================= ROCm System Management Interface ======================= |
|
2: ================================= Concise Info ================================= |
|
2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
2: 0 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
2: 2 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
2: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
2: 4 45.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
2: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
2: 6 40.0c 84.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
2: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
2: ================================================================================ |
|
2: ============================= End of ROCm SMI Log ============================== |
|
6: |
|
6: |
|
6: ======================= ROCm System Management Interface ======================= |
|
6: ================================= Concise Info ================================= |
|
6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% |
|
6: 0 48.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
6: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
6: 2 41.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
6: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
6: 4 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
6: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
6: 6 39.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% |
|
6: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% |
|
6: ================================================================================ |
|
6: ============================= End of ROCm SMI Log ============================== |
|
7: Launching on nid006946 (7/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
4: Launching on nid006943 (4/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
6: Launching on nid006945 (6/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
3: Launching on nid006942 (3/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
0: Launching on nid006939 (0/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
5: Launching on nid006944 (5/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
1: Launching on nid006940 (1/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
2: Launching on nid006941 (2/8), master nid006939 port 9999, GPUs 8, CUDA: True |
|
7: > setting tensorboard ... |
|
0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 |
|
0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. |
|
0: using torch.bfloat16 for parameters ... |
|
0: ------------------------ arguments ------------------------ |
|
0: abort_on_unmet_fused_kernel_constraints ......... False |
|
0: accumulate_allreduce_grads_in_fp32 .............. True |
|
0: adam_beta1 ...................................... 0.9 |
|
0: adam_beta2 ...................................... 0.999 |
|
0: adam_eps ........................................ 1e-08 |
|
0: adlr_autoresume ................................. False |
|
0: adlr_autoresume_interval ........................ 1000 |
|
0: apply_query_key_layer_scaling ................... True |
|
0: apply_residual_connection_post_layernorm ........ False |
|
0: attention_dropout ............................... 0.1 |
|
0: attention_softmax_in_fp32 ....................... False |
|
0: bert_binary_head ................................ True |
|
0: bert_load ....................................... None |
|
0: bf16 ............................................ True |
|
0: bias_dropout_fusion ............................. True |
|
0: bias_gelu_fusion ................................ True |
|
0: biencoder_projection_dim ........................ 0 |
|
0: biencoder_shared_query_context_model ............ False |
|
0: block_data_path ................................. None |
|
0: checkpoint_activations .......................... True |
|
0: checkpoint_in_cpu ............................... False |
|
0: checkpoint_num_layers ........................... 1 |
|
0: clip_grad ....................................... 1.0 |
|
0: codecarbon_dir .................................. None |
|
0: consumed_train_samples .......................... 0 |
|
0: consumed_train_tokens ........................... 0 |
|
0: consumed_valid_samples .......................... 0 |
|
0: contigious_checkpointing ........................ False |
|
0: cpu_optimizer ................................... False |
|
0: cpu_torch_adam .................................. False |
|
0: curriculum_learning ............................. False |
|
0: data_impl ....................................... mmap |
|
0: data_parallel_size .............................. 64 |
|
0: data_path ....................................... None |
|
0: dataloader_type ................................. single |
|
0: DDP_impl ........................................ local |
|
0: decoder_seq_length .............................. None |
|
0: deepscale ....................................... False |
|
0: deepscale_config ................................ None |
|
0: deepspeed ....................................... True |
|
0: deepspeed_activation_checkpointing .............. False |
|
0: deepspeed_config ................................ ds_configs/3319491.json |
|
0: deepspeed_mpi ................................... False |
|
0: distribute_checkpointed_activations ............. False |
|
0: distributed_backend ............................. nccl |
|
0: embed_layernorm ................................. False |
|
0: embedding_path .................................. None |
|
0: encoder_seq_length .............................. 2048 |
|
0: eod_mask_loss ................................... False |
|
0: eval_interval ................................... 10000 |
|
0: eval_iters ...................................... 1 |
|
0: eval_only ....................................... None |
|
0: evidence_data_path .............................. None |
|
0: exit_duration_in_mins ........................... None |
|
0: exit_interval ................................... None |
|
0: ffn_hidden_size ................................. 3072 |
|
0: finetune ........................................ False |
|
0: fp16 ............................................ False |
|
0: fp16_lm_cross_entropy ........................... False |
|
0: fp32_residual_connection ........................ False |
|
0: gigaflos_no_embeds .............................. 0 |
|
0: global_batch_size ............................... 256 |
|
0: glu_activation .................................. None |
|
0: hidden_dropout .................................. 0.1 |
|
0: hidden_size ..................................... 768 |
|
0: hysteresis ...................................... 2 |
|
0: ict_head_size ................................... None |
|
0: ict_load ........................................ None |
|
0: img_dim ......................................... 224 |
|
0: indexer_batch_size .............................. 128 |
|
0: indexer_log_interval ............................ 1000 |
|
0: inference ....................................... False |
|
0: init_method_std ................................. 0.02 |
|
0: init_method_xavier_uniform ...................... False |
|
0: initial_loss_scale .............................. 4294967296 |
|
0: kill_switch_path ................................ kill-switch-146m174b100m |
|
0: kv_channels ..................................... 64 |
|
0: layer_norm_fusion ............................... True |
|
0: layernorm_epsilon ............................... 1e-05 |
|
0: lazy_mpu_init ................................... None |
|
0: load ............................................ checkpoints_146m174b100m |
|
0: local_rank ...................................... None |
|
0: log_batch_size_to_tensorboard ................... True |
|
0: log_interval .................................... 100 |
|
0: log_learning_rate_to_tensorboard ................ True |
|
0: log_level ....................................... None |
|
0: log_level_replica ............................... None |
|
0: log_loss_scale_to_tensorboard ................... True |
|
0: log_num_zeros_in_grad ........................... False |
|
0: log_params_norm ................................. False |
|
0: log_path ........................................ None |
|
0: log_timers_to_tensorboard ....................... True |
|
0: log_validation_ppl_to_tensorboard ............... True |
|
0: loss_on_targets_only ............................ False |
|
0: loss_scale ...................................... 12.0 |
|
0: loss_scale_window ............................... 1000 |
|
0: lr .............................................. 0.0002 |
|
0: lr_decay_iters .................................. None |
|
0: lr_decay_samples ................................ 84762549 |
|
0: lr_decay_style .................................. cosine |
|
0: lr_decay_tokens ................................. None |
|
0: lr_warmup_fraction .............................. None |
|
0: lr_warmup_iters ................................. 0 |
|
0: lr_warmup_samples ............................... 847625 |
|
0: make_vocab_size_divisible_by .................... 128 |
|
0: mask_prob ....................................... 0.15 |
|
0: masked_softmax_fusion ........................... True |
|
0: max_position_embeddings ......................... 2048 |
|
0: mean_noise_span_length .......................... None |
|
0: memory_centric_tiled_linear ..................... False |
|
0: merge_file ...................................... gpt2/merges.txt |
|
0: micro_batch_size ................................ 4 |
|
0: min_loss_scale .................................. 1.0 |
|
0: min_lr .......................................... 2e-05 |
|
0: mmap_warmup ..................................... False |
|
0: no_load_optim ................................... None |
|
0: no_load_rng ..................................... None |
|
0: no_save_optim ................................... None |
|
0: no_save_rng ..................................... None |
|
0: noise_density ................................... None |
|
0: num_attention_heads ............................. 12 |
|
0: num_channels .................................... 3 |
|
0: num_classes ..................................... 1000 |
|
0: num_layers ...................................... 15 |
|
0: num_layers_per_virtual_pipeline_stage ........... None |
|
0: num_workers ..................................... 2 |
|
0: onnx_safe ....................................... None |
|
0: openai_gelu ..................................... False |
|
0: optimizer ....................................... adam |
|
0: optimizer_fusion ................................ True |
|
0: override_lr_scheduler ........................... False |
|
0: pad_vocab_size_to ............................... None |
|
0: params_dtype .................................... torch.bfloat16 |
|
0: partition_activations ........................... False |
|
0: patch_dim ....................................... 16 |
|
0: pipeline_model_parallel_size .................... 1 |
|
0: position_embedding_type ......................... PositionEmbeddingType.absolute |
|
0: pp_partition_method ............................. None |
|
0: profile_backward ................................ False |
|
0: query_in_block_prob ............................. 0.1 |
|
0: rampup_batch_size ............................... None |
|
0: rank ............................................ 0 |
|
0: remote_device ................................... none |
|
0: reset_attention_mask ............................ False |
|
0: reset_position_ids .............................. False |
|
0: reset_progress .................................. None |
|
0: retriever_report_topk_accuracies ................ [] |
|
0: retriever_score_scaling ......................... False |
|
0: retriever_seq_length ............................ 256 |
|
0: reweight_loss_based_on_position_frequency ....... False |
|
0: sample_rate ..................................... 1.0 |
|
0: save ............................................ checkpoints_146m174b100m |
|
0: save_interval ................................... 10000 |
|
0: scatter_gather_tensors_in_pipeline .............. True |
|
0: scattered_embeddings ............................ False |
|
0: seed ............................................ 1234 |
|
0: seq_length ...................................... 2048 |
|
0: sgd_momentum .................................... 0.9 |
|
0: short_seq_prob .................................. 0.1 |
|
0: skip_train_iteration_range ...................... None |
|
0: split ........................................... None |
|
0: split_transformers .............................. False |
|
0: sync_tp_duplicated_parameters ................... False |
|
0: synchronize_each_layer .......................... False |
|
0: tensor_model_parallel_size ...................... 1 |
|
0: tensorboard_dir ................................. tensorboard_146m174b100m |
|
0: tensorboard_log_interval ........................ 1 |
|
0: tensorboard_queue_size .......................... 5 |
|
0: test_weighted_split_paths ....................... None |
|
0: test_weighted_split_paths_path .................. None |
|
0: tile_factor ..................................... 1 |
|
0: titles_data_path ................................ None |
|
0: tokenizer_name_or_path .......................... None |
|
0: tokenizer_type .................................. GPT2BPETokenizer |
|
0: train_iters ..................................... None |
|
0: train_samples ................................... 84762549 |
|
0: train_tokens .................................... None |
|
0: train_weighted_split_names ...................... ['train'] |
|
0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] |
|
0: train_weighted_split_paths_path ................. None |
|
0: train_weighted_split_splits ..................... [['0:1']] |
|
0: train_weighted_split_weights .................... [['1.0']] |
|
0: universal_checkpoint ............................ False |
|
0: use_bnb_optimizer ............................... False |
|
0: use_checkpoint_lr_scheduler ..................... False |
|
0: use_contiguous_buffers_in_ddp ................... True |
|
0: use_cpu_initialization .......................... None |
|
0: use_one_sent_docs ............................... False |
|
0: use_pin_memory .................................. False |
|
0: valid_num_workers ............................... 2 |
|
0: valid_weighted_split_names ...................... ['validation'] |
|
0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] |
|
0: valid_weighted_split_paths_path ................. None |
|
0: valid_weighted_split_splits ..................... [['0:1']] |
|
0: valid_weighted_split_weights .................... [['1.0']] |
|
0: virtual_pipeline_model_parallel_size ............ None |
|
0: vocab_extra_ids ................................. 0 |
|
0: vocab_file ...................................... gpt2/vocab.json |
|
0: weight_decay .................................... 0.1 |
|
0: world_size ...................................... 64 |
|
0: zero_allgather_bucket_size ...................... 0.0 |
|
0: zero_contigious_gradients ....................... False |
|
0: zero_reduce_bucket_size ......................... 0.0 |
|
0: zero_reduce_scatter ............................. False |
|
0: zero_stage ...................................... 0 |
|
0: -------------------- end of arguments --------------------- |
|
0: setting number of micro-batches to constant 1 |
|
0: > building GPT2BPETokenizer tokenizer ... |
|
0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) |
|
0: DeepSpeed general environment info: |
|
0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] |
|
0: torch version .................... 1.13.0+rocm5.2 |
|
0: torch cuda version ............... None |
|
0: torch hip version ................ 5.2.21151-afdc89f8 |
|
0: nvcc version ..................... None |
|
0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] |
|
0: deepspeed info ................... 0.7.5, unknown, unknown |
|
0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 |
|
0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** |
|
0: > initializing torch distributed ... |
|
0: [2023-03-17 13:53:41,482] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl |
|
0: > initializing tensor model parallel with size 1 |
|
0: > initializing pipeline model parallel with size 1 |
|
0: > setting random seeds to 1234 ... |
|
0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 |
|
0: > compiling dataset index builder ... |
|
0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' |
|
0: make: Nothing to be done for 'default'. |
|
0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' |
|
0: >>> done with dataset index builder. Compilation time: 0.065 seconds |
|
0: > compiling and loading fused kernels ... |
|
|