diff --git "a/logs/main_log.txt" "b/logs/main_log.txt" new file mode 100644--- /dev/null +++ "b/logs/main_log.txt" @@ -0,0 +1,43007 @@ +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op name op name................ ................installed installed.. ..compatible +compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ...............cpu_adam [YES] ..................... [YES][OKAY] +...... [OKAY] +fused_adam ............. [NO]fused_adam ....... .............[OKAY] +[NO] .......fused_lamb .............[OKAY] [NO] + ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] .......sparse_attn [OKAY] +............ transformer[NO] ................... [NO][OKAY] +....... [OKAY] +transformer ............ stochastic_transformer[NO] ........ [OKAY][NO] + ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +ninjafused_adam ............................... [NO][OKAY] +....... [OKAY]-------------------------------------------------- + +op name ................ installedfused_lamb ............... compatible[NO] + .......-------------------------------------------------- +[OKAY] +cpu_adam ............... [YES] sparse_attn...... ............[OKAY] [NO] + ....... [OKAY] +transformer ............ [NO] .......fused_adam [OKAY]............. + [NO] ....... [OKAY]stochastic_transformer + . [NO]fused_lamb .................... [OKAY] + [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] +[OKAY] + +-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +op name +op nameop name op name................................ ................installed................installed installed....installed .. compatible compatible.. + +compatible ----------------------------------------------------------------------------------------------------compatible + + + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam ...............cpu_adam............... cpu_adam[YES] ...............[YES]...... ............... [YES] [OKAY] ......[YES] +...... [OKAY][OKAY]...... + + [OKAY] +fused_adam ............. fused_adam[NO] fused_adam............. fused_adam ....... [NO] ............. .............[OKAY] ....... + [NO][NO][OKAY] + fused_lamb.......fused_lamb....... .............[OKAY].............[OKAY] +[NO][NO] + fused_lamb.............. .............[OKAY]fused_lamb[OKAY] + + [NO]............. .......[NO] [OKAY]....... + [OKAY] +sparse_attn ............sparse_attn [NO]............ .......[NO] sparse_attn sparse_attn[OKAY] ....... + ........................[OKAY] +transformer[NO][NO] ............transformer ............ ..............[NO] [OKAY][NO].......[OKAY] +....... + [OKAY][OKAY]transformer + +transformer ........................ stochastic_transformer[NO] stochastic_transformer [NO] ........ . .......[NO][NO][OKAY] +[OKAY].............. stochastic_transformer[OKAY] + [OKAY] + +.stochastic_transformer [NO] ........ [OKAY][NO] + ....... [OKAY] +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +-------------------------------------------------- +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] +[OKAY][OKAY] +-------------------------------------------------- + + +------------------------------------------------------------------------------------------------------------------------------------------------------op name + + +................op nameop nameop name ................ ................ installed ................ installedinstalled .. installed .. .. compatible compatible.. + +compatible --------------------------------------------------compatible-------------------------------------------------- + + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES] ............... ...............[YES] ...... [YES]......[YES] [OKAY] +......[OKAY]...... + [OKAY][OKAY] + +fused_adam ............. fused_adam[NO] fused_adam ............. fused_adam.......[NO]............. .......[NO][OKAY]............. [OKAY][NO] + +....... ....... [OKAY]fused_lamb[OKAY]fused_lamb ............. +............. + [NO][NO]fused_lamb ....................fused_lamb .......[OKAY] +.............[NO][OKAY] +[NO] .............. [OKAY][OKAY] + +sparse_attn ............sparse_attn [NO]............ [NO]....... sparse_attn....... sparse_attn [OKAY] ............[OKAY] + +............ transformertransformer[NO][NO] ...................................... [NO][NO][OKAY][OKAY] +....... + .......[OKAY]transformer +transformer [OKAY] ............ +............stochastic_transformer [NO][NO] stochastic_transformer. ....... [NO]....... [OKAY].......[OKAY] +[OKAY]. + + [NO] .......stochastic_transformer stochastic_transformer[OKAY] +.. [NO][NO] .............. [OKAY][OKAY] + +ninjaninjaninja ..................ninja ....................................[OKAY] +..................[OKAY][OKAY] -------------------------------------------------- + + +[OKAY]op name-------------------------------------------------- + +--------------------------------------------------................-------------------------------------------------- + op name +installed op name op name................ .. ................ ................ installedcompatible installed +installed -------------------------------------------------- ...... + compatiblecompatiblecompatible + + +---------------------------------------------------------------------------------------------------- +cpu_adam-------------------------------------------------- + +............... [YES] ...... [OKAY] +cpu_adam ...............cpu_adamcpu_adam [YES]............... ............... [YES] ...... fused_adam[YES] ...... [OKAY]............. +......[OKAY] +[NO][OKAY] +....... [OKAY] +fused_adam fused_lamb............. fused_adam.............[NO] .............[NO].......fused_adam [NO] ....................[OKAY] +[OKAY]....... +[NO] [OKAY]fused_lamb +....... .............[OKAY] +[NO] fused_lamb....... .............fused_lambsparse_attn [OKAY] [NO] +......................... .......[NO][NO] [OKAY] ....... + .......[OKAY] +[OKAY] +sparse_attntransformer ........................ [NO][NO] .............. [OKAY][OKAY] +sparse_attn + transformer............ sparse_attn[NO]............ stochastic_transformer...................[NO] [OKAY] [NO]........ + [NO].......[OKAY]transformer + [OKAY]................... + [OKAY][NO]transformerstochastic_transformer + ................... . [OKAY] [NO] +[NO] ....... .......[OKAY] +stochastic_transformer[OKAY] +stochastic_transformer. [NO] ........ [OKAY][NO] + ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................. .................. ..................[OKAY] [OKAY] +[OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------op name + +op name op name op name................ ................ ................ ................ installed installed installed .... installed compatible..compatible + +.. -------------------------------------------------- --------------------------------------------------compatible +compatible + + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam cpu_adamcpu_adam............... ............... ...............[YES]............... [YES] ......[YES] [YES] ...... [OKAY] ............ +[OKAY] [OKAY] +[OKAY] + +fused_adamfused_adamfused_adamfused_adam ............. .......................................[NO] [NO] [NO][NO] ....... ....... .............. [OKAY][OKAY][OKAY][OKAY] + + + +fused_lambfused_lambfused_lamb fused_lamb .......................... ..........................[NO] [NO] [NO]....... [NO] .............. [OKAY] ....... [OKAY] + +[OKAY][OKAY] + +sparse_attn ............ sparse_attn[NO] sparse_attn................... sparse_attn[OKAY] ............[NO] +............ [NO].......transformer[NO] .......[OKAY]................... + [OKAY][OKAY][NO]transformer + + ................... [OKAY]transformer[NO] transformer +............ ................... stochastic_transformer[NO][OKAY][NO] + ............... [OKAY]stochastic_transformer[OKAY][NO] + + ........ [OKAY]stochastic_transformerstochastic_transformer[NO] + ......... [OKAY][NO][NO] + .............. [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja + + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +--------------------------------------------------op name +op name op name ................op name................ ................ installed ................ installed installed.. installed....compatible + ..--------------------------------------------------compatible + +compatiblecompatible + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES] ...... cpu_adamcpu_adamcpu_adam [OKAY] ............... +.............................. [YES][YES][YES] .................. [OKAY][OKAY][OKAY] + + +fused_adam ............. [NO] ....... [OKAY] +fused_lambfused_adamfused_adam fused_adam .......................... ............. .............[NO][NO][NO] [NO]..................... .......[OKAY][OKAY][OKAY] + + + [OKAY]fused_lamb + fused_lamb............. .............[NO]fused_lamb [NO].................... .......[OKAY][NO] + [OKAY].......sparse_attn + [OKAY]............ +[NO] ....... [OKAY] +sparse_attn transformer............sparse_attn ............[NO]............sparse_attn [NO] ....... [NO]................... [OKAY] [OKAY] +....... +[NO] [OKAY]transformer.......stochastic_transformer + ............[OKAY]. transformer + [NO] [NO]transformer ............ ....... .......[NO] ...................[OKAY][OKAY] + +[OKAY][NO] + .......stochastic_transformer [OKAY] +stochastic_transformer. [NO] ........stochastic_transformer [NO][OKAY] . +....... [NO][OKAY] +....... [OKAY] +ninjaninjaninjaninja .................. .................. .................. ..................[OKAY][OKAY][OKAY] + +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +op name +op name op name ................ ................op name ................ installed ................ installedinstalled .. installed .... compatible +..compatible-------------------------------------------------- compatible +compatible + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + +cpu_adam ............... [YES] ......cpu_adamcpu_adamcpu_adam ...............[OKAY].............................. + [YES][YES][YES] .................. [OKAY] [OKAY] +[OKAY]fused_adam + + ............. [NO] ....... [OKAY] +fused_adam fused_lambfused_adam.............fused_adam [NO] .......................... ............. .......[NO] [NO] [NO].......[OKAY] +..............[OKAY] +[OKAY][OKAY]fused_lamb + + .............fused_lamb [NO].............fused_lamb .......[NO]............. [OKAY]sparse_attn.......[NO] + ............[OKAY]....... + [NO][OKAY] +....... [OKAY] +sparse_attntransformer ........................ sparse_attn[NO][NO] .......sparse_attn............ ....... [OKAY][NO] +............[OKAY] +.......transformer[NO] [OKAY]stochastic_transformer................... + [NO]. [OKAY] .......transformer + [NO] [OKAY] .......transformer +............ [OKAY]............ +[NO] stochastic_transformer [NO] ....... ........[OKAY] +[OKAY][NO] + ....... [OKAY]stochastic_transformer + stochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +ninjaninjaninjaninja .................. ....................................[OKAY] ..................[OKAY] +[OKAY] + +[OKAY]---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +op name-------------------------------------------------- op name + +................ op nameop name ................ installed ................................installed .. installed..installed compatible compatible +.... +-------------------------------------------------- -------------------------------------------------- + +compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam ...............cpu_adam [YES]............... cpu_adam[YES]cpu_adam...... ...... .............................. [OKAY] [OKAY] +[YES][YES] + ............ [OKAY][OKAY] + +fused_adam ............. [NO] fused_adam....... .............fused_adam[OKAY] fused_adam +[NO] ....................fused_lamb............. ............. [NO][OKAY] + [NO] .......[NO]....... fused_lamb [OKAY]....... [OKAY] ............. + +[OKAY] +[NO]fused_lamb fused_lamb ....... ............. ............. [OKAY] [NO] +[NO] ..............sparse_attn [OKAY][OKAY]............ + + [NO] ....... [OKAY] +sparse_attn ............ [NO]transformer ................... sparse_attn[OKAY] sparse_attn +[NO] ............ ............transformer ....... [NO][NO] ............[OKAY] +.............. [NO] [OKAY]stochastic_transformer [OKAY] + +........ transformer transformer[OKAY][NO] +............................... [NO]stochastic_transformer[OKAY][NO] + .............. .[OKAY] +[OKAY][NO] +.......stochastic_transformer [OKAY]stochastic_transformer + . [NO]. .......[NO] [OKAY]....... + [OKAY] +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report + + + +DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- + + +-------------------------------------------------- +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ........ compatiblecompatiblecompatible + +compatible +------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +-------------------------------------------------- +cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam .................................... ............... [YES][OKAY] +[YES] [YES] ............ ......[OKAY] +[OKAY] +fused_adam[OKAY] +............. [NO] .......fused_adam [OKAY]fused_adam............. + .............[NO] fused_lamb[NO]....... fused_adam[OKAY].................... + [NO][OKAY] fused_lamb ............. + .................... [OKAY]fused_lamb[NO] + [NO].................... .......[OKAY][NO] + [OKAY]....... [OKAY] +sparse_attn ............ [NO] ....... +[OKAY]sparse_attn + ............ [NO]transformer sparse_attn ....... ............fused_lamb ............ [OKAY] +[NO][NO] ....... .......transformer............. [OKAY] [OKAY] +............ + [NO] transformer....... stochastic_transformer[NO] ............ [OKAY] +........[NO]stochastic_transformer ....... [OKAY][OKAY][NO]. + + .......[NO] stochastic_transformer [OKAY] ....... + .[OKAY] +[NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io ............... [NO]utils ......................... [NO][YES] + ...... [OKAY] +quantizer .............. [NO] .......transformer_inference [OKAY].. + [NO] .......-------------------------------------------------- +[OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum-------------------------------------------------- + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference [WARNING]  async_io: please install the libaio-devel package with yum .. [NO] +....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +DeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +op nameop name + op name ................op name................ installed................installed................ ....installed installed compatible compatible +.... +-------------------------------------------------- compatible-------------------------------------------------- +compatible + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adam ............... [YES]cpu_adam cpu_adamcpu_adam..................... ...............[YES][OKAY]............... +...... [YES] [YES] [OKAY] ...... +...... [OKAY][OKAY] + +fused_adam ............. [NO] ....... fused_adam[OKAY] +.............fused_adam fused_adam [NO]fused_lamb .......................... ............. [NO].......[NO] [NO] [OKAY]....... + .............. [OKAY] [OKAY] +fused_lamb[OKAY] + +.............fused_lamb fused_lamb[NO] .......................... .......[NO] [NO] [OKAY] ....... +.......sparse_attn [OKAY][OKAY]............ + + [NO] ....... [OKAY] +transformersparse_attn ........................ [NO][NO] sparse_attn sparse_attn.............. [OKAY] ............[OKAY]............ + + [NO][NO] transformer.......stochastic_transformer [OKAY] ................... + . [OKAY][NO][NO]transformer + .......transformer................... [OKAY][OKAY]............[NO] + + [NO]....... stochastic_transformer.......[OKAY] +[OKAY]. + stochastic_transformer[NO] .......stochastic_transformer .[OKAY] +[NO]. [NO]....... .......[OKAY] +[OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] +[OKAY] + + +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------op name + + + op nameop nameop name................ ................ ................ ................ installedinstalled ..installed installed ....compatible .. + compatible --------------------------------------------------compatible +compatible + + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam [YES]...... ............... ............... ......[OKAY] [YES][OKAY] + [YES] +...... ......[OKAY] +[OKAY] +fused_adam ............. fused_adam[NO] ....................fused_adam [OKAY]fused_adam[NO]............. + ....................[NO] ....... [NO][OKAY]fused_lamb + [OKAY].................... + fused_lamb[OKAY][NO] +.............fused_lamb....... fused_lamb [NO]............. .......[OKAY]............. [NO] + [NO][OKAY]....... + .......[OKAY] +[OKAY] +sparse_attn ............ [NO] sparse_attn.......sparse_attn [OKAY]sparse_attn........................ + [NO]............[NO]transformer ..........................[NO] [OKAY][NO]....... [OKAY] +....... +[OKAY] +transformer[OKAY]transformer +transformer............ ............stochastic_transformer............[NO] .......[NO] [NO]. [OKAY] ....... +....... [NO] [OKAY].......[OKAY]stochastic_transformer +[OKAY] +. + [NO]stochastic_transformer ....... stochastic_transformer [OKAY]. + .[NO] [NO]....... .......[OKAY] +[OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO]  [WARNING]  async_io: please install the libaio-devel package with yum....... [NO] + +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.1 +1.8.1 +torch cuda version torch cuda version............... ...............11.1 +11.1nvcc version + nvcc version..................... .....................11.2 +11.2deepspeed install path + deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +...................deepspeed info 0.5.5+cd7967d, cd7967d, master................... + deepspeed wheel compiled w.0.5.5+cd7967d, cd7967d, master +......deepspeed wheel compiled w. torch 1.8, cuda 11.1 +...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ......async_io [OKAY] + ............... [NO] .......quantizer [NO].............. + [NO] ....... [OKAY] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +.................... 1.8.1 +torch version torch cuda version.................... ...............1.8.1 +11.1 +torch cuda versionnvcc version .................................... 11.111.2 + +nvcc versiondeepspeed install path ................................ 11.2 +['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']deepspeed install path + ...........deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +0.5.5+cd7967d, cd7967d, masterdeepspeed info + deepspeed wheel compiled w.................... ......0.5.5+cd7967d, cd7967d, master +torch 1.8, cuda 11.1deepspeed wheel compiled w. + ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja + + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum +/bin/sh: line 0: type: git: not found + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +/bin/sh: line 0: type: git: not found +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report +--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja +JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum +ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] + +[OKAY] +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op nameop name + op name ................ ................op name................ installedinstalled................installed .. installed.... ..compatiblecompatible compatible + + +compatible---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +-------------------------------------------------- +cpu_adam cpu_adamcpu_adam............... cpu_adam .............................. [YES] ............... [YES]......[YES] [YES]............[OKAY] +...... [OKAY][OKAY] + +[OKAY] +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +fused_adam .............fused_adam fused_adam[NO]............. fused_adam ............. [NO] ....... .............[NO]....... [OKAY].......[NO][OKAY] + +[OKAY]....... +fused_lamb fused_lamb [OKAY] ............. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +............. fused_lamb[NO] fused_lamb[NO] ............. ....... .................... [NO][NO][OKAY][OKAY] + +.............. [OKAY] +[OKAY] +async_io ............... [NO] ....... [NO] +sparse_attn ............sparse_attn [NO]............ sparse_attn.......[NO] sparse_attn...................[OKAY] [OKAY] +[NO]............ + transformer....... transformer[NO] ............ [OKAY]............ ....... +transformer_inference .. [NO] ....... [OKAY] +[NO] [OKAY][NO]transformer....... + ...................[OKAY]transformer +utils .................. [YES] ...... [OKAY] + [NO][OKAY]............ + .......stochastic_transformer[NO] stochastic_transformer [OKAY] ........ + . [OKAY] [NO] +[NO] ..............stochastic_transformer stochastic_transformer [OKAY] [OKAY] + +.. [NO][NO] ....... .......[OKAY] +quantizer .............. [NO] ....... [OKAY] +[OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +/bin/sh: line 0: type: git: not found + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] + +[OKAY] +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +op name-------------------------------------------------- op name +................ + ................op nameop nameinstalled installed.................................. compatible ..installed + installed--------------------------------------------------compatible +.. + ..compatible-------------------------------------------------- + +compatible +----------------------------------------------------------------------------------------------------cpu_adam + + ............... [YES]cpu_adam ..................... cpu_adamcpu_adam[OKAY] ............... +...............[YES] [YES] [YES]............ ......[OKAY] [OKAY] + +fused_adam[OKAY] +............. [NO] ....... [OKAY] +fused_adamfused_lambfused_adam fused_adam............. ............. ............. ............. [NO][NO] [NO][NO] ....... .............. ....... [OKAY] [OKAY][OKAY][OKAY] + + + +fused_lambfused_lambfused_lamb ....................................... [NO][NO] [NO] sparse_attn....... ....... ............ [OKAY]....... +[OKAY][NO] +[OKAY]....... + [OKAY] +transformer ............ [NO] sparse_attn....... sparse_attn............[OKAY] +............sparse_attn[NO] [NO]stochastic_transformer................... .......[OKAY].[NO] +[OKAY] +[NO]....... transformer transformer....... [OKAY][OKAY] ............ + +............ [NO][NO]transformer .......................... [OKAY][OKAY][NO] + + ....... [OKAY] +stochastic_transformerstochastic_transformer ..stochastic_transformer [NO][NO] . ....... .......[NO][OKAY] +.......[OKAY] +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninja ...................................................... ninja[OKAY][OKAY][OKAY] + + +..................-------------------------------------------------- ---------------------------------------------------------------------------------------------------- +[OKAY] + +op name +op name op name................-------------------------------------------------- + ................ installedop name................ ..................installed installed installedcompatible.. + .. --------------------------------------------------.. compatiblecompatible + + +compatible---------------------------------------------------------------------------------------------------- + + +-------------------------------------------------- +cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY] + cpu_adam.............................. ...............[YES][YES] [YES]...... ...... ...... fused_adam[OKAY] [OKAY][OKAY] + + +............. [NO] ....... [OKAY] +fused_lambfused_adamfused_adam fused_adam .......................... ............. [NO][NO] .................... [NO] .......[OKAY] + [NO][OKAY]....... +....... [OKAY][OKAY] + +fused_lamb ............. fused_lambfused_lamb[NO] sparse_attn............. ................................[NO] [NO] [OKAY][NO]....... + ....... ....... [OKAY] [OKAY] +[OKAY] + +transformer ............ [NO] sparse_attn....... ............[OKAY] +[NO]sparse_attn .......sparse_attnstochastic_transformer ............[OKAY] + ............transformer[NO] . [NO]...................[NO] [NO][OKAY]....... ....... ....... +[OKAY] [OKAY] + +[OKAY]transformer + ............ stochastic_transformertransformer[NO] ............. ....... [NO] [NO] [OKAY] ....... +....... [OKAY][OKAY]stochastic_transformer + + . stochastic_transformer[NO] ........ [OKAY][NO] + ....... [OKAY] +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] ....... .......[NO] +[NO] +transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY] +....... [OKAY] +utils .................. utils[YES] ........................ [YES][OKAY] +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +...... [OKAY] +quantizer .............. quantizer[NO] ..................... [NO][OKAY] +....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam ...............cpu_adam [YES]............... ......[YES] ......[OKAY] +[OKAY] +fused_adam .............fused_adam [NO]............. .......[NO] [OKAY]....... + [OKAY] +fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] + +sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] + +transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformerstochastic_transformer . .[NO] [NO]....... .......[OKAY] +[OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed ninja.. compatible +.................. --------------------------------------------------[OKAY] + +-------------------------------------------------- +op name ................ installedcpu_adam ................. compatible[YES] + ......-------------------------------------------------- +[OKAY] +cpu_adam ...............fused_adam [YES]............. ......[NO] [OKAY]....... + [OKAY] +fused_lamb ............. [NO] fused_adam....... .............[OKAY] +[NO] ....... [OKAY] +fused_lamb ............. [NO] .......sparse_attn [OKAY]............ + [NO] ....... [OKAY] +transformer ............ [NO] .......sparse_attn [OKAY]............ + [NO] ....... stochastic_transformer[OKAY] +. [NO]transformer ................... [OKAY][NO] + ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... DeepSpeed general environment info: +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install pathtorch version .................... ...............1.8.1 +torch cuda version ...............['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +11.1 +nvcc versiontorch version ......................................... 11.21.8.1 + +deepspeed install path ...........torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +11.1deepspeed info + nvcc version................... .....................0.5.5+cd7967d, cd7967d, master +11.2 +deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +torch install pathDeepSpeed general environment info: ............... + torch install path['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch cuda version torch version............... ....................11.1 +1.8.1nvcc version + ..................... torch cuda version11.2 +...............deepspeed install path 11.1........... + nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']..................... + deepspeed info11.2 +...................deepspeed install path 0.5.5+cd7967d, cd7967d, master........... + deepspeed wheel compiled w. ......['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +torch 1.8, cuda 11.1deepspeed info + ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja +JIT compiled ops requires ninja + +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] +[OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------op name + + op name................ op name................op name installed................installed................ ....installed installed compatible.. compatible .. +compatible + ---------------------------------------------------------------------------------------------------- + +compatible +-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ...............cpu_adam cpu_adam [YES]............... cpu_adam ...............[YES] .....................[YES] ......[YES][OKAY]...... +[OKAY] ...... +[OKAY] +[OKAY] +fused_adamfused_adam ..........................fused_adam [NO]fused_adam [NO] .......................... ....... .......[NO] [NO] [OKAY] [OKAY] +....... +....... [OKAY][OKAY]fused_lambfused_lamb + + ............. fused_lamb.............[NO] [NO].......fused_lamb............. [OKAY].......[NO]............. + [OKAY][NO] +....... [OKAY] + ....... [OKAY] +sparse_attn ............sparse_attn sparse_attn[NO]............ ...................[NO] sparse_attn[OKAY].......[NO] + [OKAY]....... + transformer............[OKAY]transformer ............ +[NO] ............[NO] transformer .......................... [NO] [OKAY] + [NO].......[OKAY] +.......[OKAY]stochastic_transformer transformer + [OKAY] +............stochastic_transformer. [NO]stochastic_transformer[NO] ................ [OKAY][NO] [OKAY]....... + +[NO] [OKAY]....... + [OKAY]stochastic_transformer + . [NO] ....... [OKAY] +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja + + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled .. .. .... compatiblecompatible + +compatiblecompatible-------------------------------------------------- +-------------------------------------------------- + + +---------------------------------------------------------------------------------------------------- + +cpu_adam cpu_adam............... cpu_adam[YES] cpu_adam ............... ............... ..................... [YES][YES][YES][OKAY] .................. + [OKAY][OKAY][OKAY] + + +fused_adam ............. [NO] fused_adamfused_adam .......fused_adam .......................... [OKAY]............. [NO] +[NO] [NO] ....... .......fused_lamb....... [OKAY][OKAY]............. +[OKAY] + +[NO] ....... fused_lamb[OKAY]fused_lamb +fused_lamb ....................................... [NO][NO][NO] ..................... [OKAY]sparse_attn[OKAY][OKAY] + +............ + [NO] ....... [OKAY] +transformer ............ [NO]sparse_attn sparse_attn....... sparse_attn ........................ [OKAY] +............[NO] [NO] [NO] .......stochastic_transformer ....... .......[OKAY] . +[OKAY] [OKAY] +[NO] +transformer transformer ....... transformer............ ............ [OKAY] [NO]............ + [NO][NO]....... ....... .......[OKAY][OKAY] + +[OKAY] +stochastic_transformerstochastic_transformer stochastic_transformer ... [NO][NO] .......[NO]....... [OKAY].......[OKAY] + +[OKAY] +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + +JIT compiled ops requires ninja +--------------------------------------------------DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report + + +JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] +[OKAY] + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------op name +................op name + op name................installedop name .. installed................ ................ compatible ..installed +installed compatible-------------------------------------------------- + .. +.. compatiblecompatible +-------------------------------------------------- +-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY]cpu_adamcpu_adam + .............................. cpu_adam [YES][YES] ............ fused_adam[OKAY] +............................[OKAY] +[NO] ....... [OKAY] +[YES] ......fused_lambfused_adam [OKAY] .............fused_adam [NO]............. +.................... [NO][NO][OKAY] +.............. [OKAY][OKAY] + +fused_adamfused_lambfused_lamb ............. [NO] sparse_attn....... ............ [OKAY][NO] ............. + .......[NO] [OKAY].................... + [OKAY]transformersparse_attn ............ +............ [NO] [NO][NO] .............. [OKAY] +.......sparse_attn [OKAY][OKAY]............stochastic_transformer + +. transformer [NO] [NO] .......................... fused_lamb[NO][OKAY][OKAY] + +....... transformer[OKAY] +............ [NO]............. stochastic_transformer ....... [NO] .[OKAY]....... [NO] + ....... [OKAY][OKAY] + +stochastic_transformer . [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] + +[OKAY] + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + +op name +op name op name op name ................................ ................ ................ installedinstalled installed installed .... .. compatiblecompatible..compatible + + + ----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- + + + +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam.............................. ...............cpu_adam[YES][YES] [YES] ..................... ...... ...... [OKAY][YES] [OKAY] + +......[OKAY] +[OKAY] +fused_adam ............. fused_adam[NO] .................... fused_adam [NO] fused_adam[OKAY] ............. +.................... [NO][NO][OKAY] ....... +fused_lamb....... [OKAY]fused_lamb[OKAY]............. + + .............[NO] [NO]....... .......fused_lamb[OKAY]fused_lamb [OKAY] +.......................... + [NO][NO] ....... .......[OKAY] +[OKAY] +sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... + [OKAY] +transformersparse_attn sparse_attn............transformer ............ ............[NO] ............ [NO] ....... [NO] [NO]....... [OKAY] ....... +.......[OKAY] +[OKAY][OKAY] + +stochastic_transformertransformer transformer .............stochastic_transformer ............ [NO][NO] [NO] ........ ....... ....... [NO][OKAY] [OKAY] +[OKAY] +....... + [OKAY]stochastic_transformer + stochastic_transformer .. [NO] [NO]....... .......[OKAY] + [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] +[OKAY][OKAY] +-------------------------------------------------- + +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +op nameop name +op name op name ................ ................................ ................installed installed installed installed .. ...... compatible compatible +compatible +compatible-------------------------------------------------- +-------------------------------------------------- + +-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam ................................................... [OKAY][YES] +[YES][YES] .................. [OKAY][OKAY][OKAY] + +fused_adam + ............. [NO] ....... [OKAY] +fused_adam fused_adamfused_adamfused_lamb ............. ............. .......................... [NO][NO] [NO] [NO] .............. ....... ....... [OKAY][OKAY][OKAY] +[OKAY] + + +fused_lambfused_lamb fused_lamb.......................... .............[NO][NO]sparse_attn ....... [NO][OKAY]................... +.......[OKAY][NO] + [OKAY]....... +[OKAY] +transformer sparse_attn............ sparse_attn [NO]........................ sparse_attn ....... [NO] [NO]............[OKAY] ....... +[OKAY].......[NO] + stochastic_transformer[OKAY].......transformer + .[OKAY]............ transformer +[NO][NO] .......................... transformer [OKAY][OKAY][NO] + + ................... [NO][OKAY] +stochastic_transformer....... stochastic_transformer.[OKAY] [NO] +........ [OKAY]stochastic_transformer[NO] + ........ [OKAY] +[NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +ninjaninjaninjaninja .................. .................. .................................... [OKAY][OKAY][OKAY] + + +[OKAY]---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op nameop name +op name op name................................................ ................installedinstalledinstalled installed...... .. compatiblecompatiblecompatible +compatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +-------------------------------------------------- +cpu_adamcpu_adamcpu_adam cpu_adam ............................................................ [YES] [YES][YES] [YES]............ ...... ...... [OKAY][OKAY][OKAY][OKAY] + + + +fused_adamfused_adamfused_adamfused_adam ....................................... [NO].............[NO][NO] .....................[NO] [OKAY][OKAY][OKAY]....... + + + [OKAY]fused_lamb + fused_lamb............. fused_lamb............. .............fused_lamb[NO][NO] [NO] .................... ....... ....... [OKAY][OKAY][NO][OKAY] + + +....... [OKAY] +sparse_attnsparse_attn ............sparse_attnsparse_attn............ ............ ............ [NO] [NO][NO][NO] .............. .............. [OKAY] [OKAY] +[OKAY][OKAY] + + +transformer transformertransformertransformer ............ ........................ ............ [NO] [NO][NO] [NO] ..................... ....... [OKAY][OKAY] [OKAY] + +[OKAY] + +stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer . . .[NO] . [NO] [NO][NO].............. .............. [OKAY] [OKAY] +[OKAY][OKAY] + + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +.. [NO] ....... [OKAY] +utils ..................async_io [YES] ..................... [OKAY][NO] + ....... [NO]quantizer + .............. [NO] ....... [OKAY] +transformer_inference-------------------------------------------------- +.. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io ............... [NO] .......async_io [NO] +............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO] utils....... ..................[OKAY] +[YES] ...... [OKAY] +utils .................. quantizer[YES] .................... [NO][OKAY] +....... [OKAY]quantizer + .............. [NO] ....... --------------------------------------------------[OKAY] + +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'].................... + 1.8.1 +torch version torch cuda version.................... ...............1.8.1 +11.1 +torch cuda versionnvcc version .................................... 11.111.2 + +nvcc versiondeepspeed install path ................................ 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed install pathdeepspeed info .............................. 0.5.5+cd7967d, cd7967d, master['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +------------------------------------------------------------------------------------------------------------------------------------------------------ + +--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + + +DeepSpeed C++/CUDA extension op report----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja +-------------------------------------------------- + + +JIT compiled ops requires ninja + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  async_io: please install the libaio-devel package with yum + +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io async_io............... ...............[NO] [NO]....... .......[NO] +[NO] +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utilsutils .................................... [YES][YES] ............ [OKAY] +[OKAY] +quantizerquantizer ............................ [NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] +[OKAY]-------------------------------------------------- +[OKAY] + +-------------------------------------------------- + +op name-------------------------------------------------- op name-------------------------------------------------- +................ + ................op nameinstalledop name installed.................................. ..compatibleinstalled +installed -------------------------------------------------- compatible.... + + --------------------------------------------------compatiblecompatible + + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... [YES]cpu_adam ...... ...............cpu_adam[OKAY] cpu_adam + [YES] .................................... [YES][YES][OKAY] +............fused_adam [OKAY][OKAY]............. + + [NO] ....... fused_adam[OKAY] +............. [NO] .......fused_lamb fused_adam [OKAY]............. + fused_adam ............. .............[NO] [NO]fused_lamb.......[NO] .......[OKAY].................... + [OKAY][NO] +[OKAY]fused_lamb +....... .............[OKAY] +fused_lamb[NO] .................... [NO][OKAY] sparse_attn + ................... [OKAY][NO] + sparse_attn....... ............[OKAY] +[NO] .......sparse_attn transformer [OKAY] +........................sparse_attn transformer [NO][NO] ............ ................... ....... [NO][NO] [OKAY] [OKAY] ....... + +....... [OKAY]transformerstochastic_transformer[OKAY] + +............ .[NO] transformerstochastic_transformer [NO] ....... ................... . [NO] [OKAY][OKAY][NO] + +.............. [OKAY] +[OKAY]stochastic_transformer + . [NO]stochastic_transformer ....... .[OKAY] +[NO] ....... [OKAY] +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +DeepSpeed general environment info:torch cuda version ............... + 11.1 +nvcc version .....................torch install path 11.2 +...............deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']deepspeed info + ................... 0.5.5+cd7967d, cd7967d, mastertorch version + deepspeed wheel compiled w..................... ......1.8.1 +torch 1.8, cuda 11.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] .......async_io [OKAY]............... + [NO] ....... [NO] +utils .................. [YES] ...... [OKAY] +quantizer transformer_inference.............. ..[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +[NO] ....... [OKAY] +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO]async_io ....... [NO]............... + [NO] ....... [NO] +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] + +quantizer .............. quantizer[NO] ..................... [NO][OKAY] +....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] + +transformer_inference ..transformer_inference [NO] ....... ..[OKAY] +[NO] ....... [OKAY]utils + .................. [YES] ...... [OKAY]utils + .................. [YES] ......quantizer [OKAY].............. + [NO] ....... [OKAY] +quantizer .............. [NO]-------------------------------------------------- +....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + ............... [NO] ....... [NO] +async_io ............... [NO] transformer_inference....... ..[NO] +[NO] ....... [OKAY] +utils transformer_inference.................. ..[YES] [NO]...... .......[OKAY] +[OKAY] +quantizer .............. [NO] utils....... ..................[OKAY] +[YES] ...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO]async_io + ............... [NO] ....... [NO]transformer_inference + .. [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +utils .................. [YES] transformer_inference...... ..[OKAY] +[NO] .......quantizer ..............[OKAY] +[NO] ....... [OKAY] +utils .................. --------------------------------------------------[YES] + ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io ............... [NO] utils....... ..................[NO] +[YES] ...... [OKAY] +quantizer .............. [NO] .......transformer_inference [OKAY].. + [NO] ....... --------------------------------------------------[OKAY] + +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path + ............... torch install path['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +............... torch version .................... 1.8.1 +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch cuda version ...............torch version 11.1.................... + 1.8.1nvcc version + ..................... 11.2torch cuda version + ...............deepspeed install path 11.1........... + nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +..................... deepspeed info11.2 +................... deepspeed install path0.5.5+cd7967d, cd7967d, master +........... deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']...... + torch 1.8, cuda 11.1deepspeed info + ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninjaninja .................................... [OKAY] [OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................ ................installed ..installed compatible.. + --------------------------------------------------compatible + +-------------------------------------------------- +cpu_adam ...............cpu_adam [YES]............... ...... [YES][OKAY] +...... [OKAY] +fused_adam ............. [NO]fused_adam .................... [OKAY][NO] + ....... [OKAY]fused_lamb + ............. [NO]fused_lamb .................... [OKAY] +[NO] ....... [OKAY] +sparse_attn ............ [NO] .......sparse_attn [OKAY] +............ [NO] .......transformer [OKAY]............ + [NO] ....... transformer[OKAY] +............ [NO]stochastic_transformer ....... .[OKAY] +[NO] ....... [OKAY]stochastic_transformer + . [NO] ....... [OKAY] +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb .............ninja [NO] ......................... [OKAY][OKAY] + +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +cpu_adamtransformer ........................... [YES] [NO]...... [OKAY]....... + [OKAY] +stochastic_transformer . [NO] .......fused_adam [OKAY] +............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] [WARNING]  async_io: please install the libaio-devel package with yum ....... [NO] + +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja + + +ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + + +op name op nameop nameop name................ ................................................installed installed installed.. installed compatible .. +.. .. compatible --------------------------------------------------compatiblecompatible + + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + +cpu_adam ...............cpu_adam cpu_adam[YES]cpu_adam .............................. ............... ......[YES] [YES] [YES] [OKAY] ............ +...... [OKAY][OKAY][OKAY] + + +fused_adam ............. [NO] ....... fused_adamfused_adamfused_adam [OKAY] +....................................... fused_lamb [NO][NO][NO] ........................... [NO]....... [OKAY]....... + [OKAY][OKAY][OKAY] + +fused_lamb + ............. [NO]fused_lamb fused_lamb.................... .............[NO][OKAY] sparse_attn +.......[NO] ............[OKAY]....... + [OKAY][NO] + ....... [OKAY] +sparse_attntransformer ............ ............[NO]sparse_attn sparse_attn .......[NO] ............ ............[OKAY] + .......[NO][NO] transformer .......[OKAY] ....... +............ [OKAY] [OKAY][NO] + +stochastic_transformer .......transformertransformer . [OKAY] ........................ + [NO] [NO] [NO] stochastic_transformer....... ....... ........[OKAY] [OKAY] + +[NO][OKAY] +.......stochastic_transformer [OKAY] +.stochastic_transformer [NO] ........ [OKAY] +[NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 +torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja + + +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +---------------------------------------------------------------------------------------------------- +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +ninjaninjaninjaninja .................. .................. ..................[OKAY] .................. + [OKAY][OKAY][OKAY]-------------------------------------------------- + + + +--------------------------------------------------op name +-------------------------------------------------- -------------------------------------------------- + +................op name op nameop name installed .................................................. installed installedinstalledcompatible .. +.. ..--------------------------------------------------compatible +compatible + +--------------------------------------------------compatible-------------------------------------------------- + + +--------------------------------------------------cpu_adam + ............... [YES] ......cpu_adam cpu_adam[OKAY] +..............................cpu_adam [YES][YES]............... ............[YES] [OKAY]fused_adam [OKAY] + ...... +............. [OKAY][NO] + ....... [OKAY] +fused_adam .............fused_lambfused_adam [NO]............. fused_adam .................... [NO] ............. [NO] [OKAY]....... +[NO]....... .......[OKAY]fused_lamb[OKAY] + +.............[OKAY] +fused_lamb[NO] fused_lamb.................... [OKAY].............[NO] + [NO]sparse_attn....... ...................[OKAY] [OKAY] +[NO] + .......sparse_attn [OKAY] +............ [NO]transformer ................... [OKAY][NO] +sparse_attn sparse_attn....... transformer [OKAY]............ +............ ............ [NO][NO][NO] stochastic_transformer ....... .............. .[OKAY][OKAY] +[OKAY] +[NO] + .......transformer [OKAY]transformer stochastic_transformer +............ ............[NO] . [NO] ....... [NO] ....... [OKAY]....... + [OKAY][OKAY] + +stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... + [OKAY] +ninjaninjaninja ninja.................. .................. ..................[OKAY].................. + [OKAY][OKAY][OKAY]-------------------------------------------------- + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +op name + + op nameop name................op name ................................installed ................ installed installed .. installed.. .. compatible..compatible + +compatible -------------------------------------------------- +-------------------------------------------------- +compatible-------------------------------------------------- + + +-------------------------------------------------- +cpu_adam ............... cpu_adam[YES] cpu_adam............... ...... cpu_adam............... [YES] [OKAY] ...............[YES] +...... [YES]......[OKAY] +......[OKAY] + [OKAY]fused_adam + ............. [NO] ....... [OKAY] +fused_adam .............fused_lamb fused_adam [NO] ..........................fused_adam ....... [NO] [NO]............. [OKAY] +.......[NO]....... [OKAY] fused_lamb[OKAY] + ....... +............. [OKAY][NO] +fused_lamb .................... fused_lamb[NO][OKAY] +....................sparse_attn [OKAY][NO]............ + .......[NO] [OKAY]....... + [OKAY] +sparse_attntransformer ........................ [NO][NO] .......sparse_attn....... ............[OKAY][OKAY] + +sparse_attn[NO] transformer................... stochastic_transformer ............ [OKAY][NO] +. [NO].......[NO] transformer[OKAY]....... + ...................[OKAY] [NO] +transformer[OKAY] ....... +............ stochastic_transformer [OKAY][NO] + ....... .[OKAY] stochastic_transformer +[NO] ........stochastic_transformer [NO] [OKAY]........ + [NO][OKAY] +....... [OKAY] +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report +-------------------------------------------------- + + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja + + + +JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninja +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] + + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop nameop nameop name ................................................ ................ installed installed installedinstalled .. .. ..compatible .. + compatiblecompatible-------------------------------------------------- + +compatible +---------------------------------------------------------------------------------------------------- + + +-------------------------------------------------- +cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam ............... ..................... ...............[YES][OKAY] [YES]...... +[YES] ......[OKAY]...... + [OKAY][OKAY] + +fused_adam .............fused_adam fused_adam .............[NO]fused_adam [NO]................................. .......[NO][OKAY][NO] +[OKAY] ....... +....... [OKAY]fused_lamb[OKAY] +fused_lamb +............. .............fused_lamb[NO] fused_lamb [NO]....... .......................... [OKAY] ....... +[NO] [NO] [OKAY] ....... +....... [OKAY][OKAY] + +sparse_attn ............sparse_attn [NO]sparse_attnsparse_attn............ ....... ............ ............[NO][OKAY][NO] + .......[NO]....... transformer[OKAY].......[OKAY] + +............[OKAY]transformer transformer + [NO] transformer........................ [NO][NO]................... .......[NO]....... [OKAY] [OKAY] +[OKAY] +....... + stochastic_transformer[OKAY] +stochastic_transformerstochastic_transformer. [NO]..stochastic_transformer .......[NO] [NO]. [OKAY]....... .......[NO] + [OKAY][OKAY]....... + + [OKAY] +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.1 + nvcc version11.1 +..................... nvcc version11.2 +..................... deepspeed install path11.2 +...........deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']deepspeed info + deepspeed info................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja +-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + + + +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] +[OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- +op name +op name op name ................ op name ................................ installed installed ................ installed .... installed ..compatiblecompatible.. + + --------------------------------------------------compatible +--------------------------------------------------compatible + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adam cpu_adam............... cpu_adam............... cpu_adam [YES]...............[YES] [YES]..................... ...... ...... [OKAY] [OKAY][YES] +[OKAY] +...... + [OKAY] +ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY] +[OKAY]-------------------------------------------------- + + + +------------------------------------------------------------------------------------------------------------------------------------------------------op name + + +fused_adamfused_adam fused_adam .............fused_adam ............. ............. [NO] .............[NO][NO] .......[NO]....... ....... [OKAY] [OKAY] +[OKAY]....... + +op name................op name op name ................installed ................ ................ installed installed....installed compatible..compatible.. + -------------------------------------------------- + compatible +--------------------------------------------------compatible + + +---------------------------------------------------------------------------------------------------- + + [OKAY] +cpu_adam ............... cpu_adam[YES] cpu_adam...............cpu_adam...... ...............[YES] ............... [OKAY]......[YES] [YES][OKAY]...... + + ......[OKAY] +fused_lambfused_lamb fused_lamb............. fused_lamb.............[NO] .............[NO]............. .......[NO].......[NO] [OKAY].......[OKAY]....... +[OKAY] + +[OKAY] +[OKAY] +fused_adam fused_adam.............fused_adam fused_adam[NO] ............. .................... ............. [OKAY][NO][NO] +sparse_attnsparse_attnsparse_attn sparse_attn ........................ ............[NO] ............ [NO] [NO]....... [NO][OKAY] ....... + [NO]....... .......[OKAY]fused_lamb....... + .............[OKAY] +..............[OKAY] +[OKAY]transformer + [OKAY]transformer +fused_lamb[NO][OKAY] +............ transformer ............ ............[NO]transformer[NO] .......[NO]................... [OKAY].......[OKAY][NO] + +fused_lamb.................... fused_lamb.............[NO] [OKAY]....... [OKAY]............. +[NO] + [NO]....... ....... [OKAY][OKAY] + .......[OKAY] +[OKAY]stochastic_transformer + +stochastic_transformer stochastic_transformer. . stochastic_transformer[NO].[NO] ..............[NO] .[OKAY] [OKAY]....... + +sparse_attnsparse_attn ............ ............[NO] .......[NO] sparse_attn[OKAY]....... +[NO] [OKAY]....... + [OKAY] + ............[OKAY]sparse_attn + transformer[NO] ............transformer................... [NO][NO] ................... [OKAY] [NO].......[OKAY] + +....... transformer[OKAY] [OKAY] +............stochastic_transformer +stochastic_transformer transformer [NO]. ....................[NO] [NO] ....... [NO] .......[OKAY] [OKAY] +.......[OKAY] + +stochastic_transformer[OKAY] + . [NO]stochastic_transformer ....... [OKAY]. + [NO] ....... [OKAY] +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja ninja.................. [OKAY].................. + [OKAY]-------------------------------------------------- + +--------------------------------------------------op name + ................op name installed................ ..installed compatible +.. --------------------------------------------------compatible + +-------------------------------------------------- +cpu_adam ............... cpu_adam[YES] ..................... [YES][OKAY] +...... [OKAY] +fused_adam ............. [NO]fused_adam .................... [OKAY][NO] + ....... [OKAY]fused_lamb + ............. [NO] fused_lamb....... .............[OKAY] +[NO] ....... [OKAY] +sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] +[OKAY] +transformer ............transformer [NO]............ .......[NO] [OKAY]....... + [OKAY] +stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... + [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... ninja[OKAY] + .................. [OKAY] +-------------------------------------------------- +op name ................ installed ..sparse_attn compatible............ + [NO]-------------------------------------------------- +....... [OKAY] +transformer ............ cpu_adam[NO] ...................... [YES][OKAY] +...... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY] +[OKAY][OKAY] + +-------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------ +op name + + op nameop name................op name ................installed................ ................ .. installedinstalled installed ..compatible compatible.. + +.. -------------------------------------------------- -------------------------------------------------- +compatible +compatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam cpu_adam............... ...............[YES]cpu_adam ......cpu_adam[YES] .............................. [OKAY]...... + [YES][YES][OKAY] +............ [OKAY][OKAY] + +fused_adam ............. fused_adam[NO] .................... [NO]fused_adamfused_adam[OKAY] +.................... ............. [OKAY]fused_lamb [NO] + [NO] ............. fused_lamb.......[NO]....... ....................[OKAY][OKAY] +[NO] +[OKAY] +....... [OKAY] +fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] + +sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] +[OKAY] +sparse_attntransformertransformer sparse_attn ........................ ............ [NO]............ [NO].......[NO] [OKAY]....... +[NO]....... [OKAY]stochastic_transformer[OKAY]....... + + [OKAY]. +transformer [NO]stochastic_transformertransformer ............ ........ ............[NO] [OKAY] +[NO][NO] ..................... [OKAY][OKAY][OKAY] + + +stochastic_transformer stochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] +[OKAY] + +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + +op name--------------------------------------------------op name op name + ................ ................op name ................ installed installed..................installed compatible.... installed +compatible -------------------------------------------------- +compatible.. +-------------------------------------------------- + +compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam......[YES] ...............[OKAY]............... ...... + [YES] [YES] [OKAY] ...... +...... [OKAY] +[OKAY] +fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adam +fused_adam ............. ............. ............. [NO]fused_lamb[NO] .................... [NO] .......[OKAY][NO]....... ....... +[OKAY] +[OKAY][OKAY] + +fused_lambfused_lamb ............. .............[NO] fused_lamb [NO] ....... ............. ....... [NO][OKAY]sparse_attn + [OKAY]................... + [NO][OKAY] +....... [OKAY] +transformersparse_attn ........................sparse_attn [NO][NO] ............ sparse_attn.............. ............ [OKAY][NO] [OKAY] + +[NO].......transformer [OKAY]................... +stochastic_transformer [NO][OKAY]transformer . + ...................[NO]transformer [OKAY] [NO]....... +............ .......[OKAY][NO] stochastic_transformer + .......[OKAY] +.[OKAY] +[NO]stochastic_transformer stochastic_transformer ....... .[OKAY]. + [NO][NO] .............. [OKAY][OKAY] + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed infoDeepSpeed general environment info: ................... +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. torch install path...... torch 1.8, cuda 11.1 +............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] .......async_io [NO] + ............... [NO] ....... [NO] +transformer_inference .. [NO] .......transformer_inference [OKAY] +.. [NO] ....... utils[OKAY] +.................. [YES] ...... [OKAY]utils + .................. [YES] quantizer...... ..............[OKAY] +[NO] ....... [OKAY]quantizer + .............. [NO]-------------------------------------------------- +....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 +using torch.float16 for parameters ... +------------------------ arguments ------------------------ + accumulate_allreduce_grads_in_fp32 .............. False + adam_beta1 ...................................... 0.9 + adam_beta2 ...................................... 0.95 + adam_eps ........................................ 1e-08 + adlr_autoresume ................................. False + adlr_autoresume_interval ........................ 1000 + apply_query_key_layer_scaling ................... True + apply_residual_connection_post_layernorm ........ False + attention_dropout ............................... 0.1 + attention_softmax_in_fp32 ....................... False + bert_binary_head ................................ True + bert_load ....................................... None + bf16 ............................................ False + bias_dropout_fusion ............................. True + bias_gelu_fusion ................................ True + biencoder_projection_dim ........................ 0 + biencoder_shared_query_context_model ............ False + block_data_path ................................. None + checkpoint_activations .......................... True + checkpoint_in_cpu ............................... False + checkpoint_num_layers ........................... 1 + clip_grad ....................................... 1.0 + codecarbon_dir .................................. None + consumed_train_samples .......................... 0 + consumed_train_tokens ........................... 0 + consumed_valid_samples .......................... 0 + contigious_checkpointing ........................ False + cpu_optimizer ................................... False + cpu_torch_adam .................................. False + curriculum_learning ............................. False + data_impl ....................................... mmap + data_parallel_size .............................. 1 + data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] + dataloader_type ................................. single + DDP_impl ........................................ local + decoder_seq_length .............................. None + deepscale ....................................... False + deepscale_config ................................ None + deepspeed ....................................... True + deepspeed_activation_checkpointing .............. True + deepspeed_config ................................ ./ds_config.1504412.json + deepspeed_mpi ................................... False + distribute_checkpointed_activations ............. False + distributed_backend ............................. nccl + embedding_path .................................. None + encoder_seq_length .............................. 2048 + eod_mask_loss ................................... False + eval_interval ................................... 1000 + eval_iters ...................................... 5 + evidence_data_path .............................. None + exit_duration_in_mins ........................... 1190 + exit_interval ................................... None + ffn_hidden_size ................................. 46400 + finetune ........................................ False + fp16 ............................................ True + fp16_lm_cross_entropy ........................... False + fp32_residual_connection ........................ False + gigaflos_no_embeds .............................. 0 + global_batch_size ............................... 2048 + glu_activation .................................. None + hidden_dropout .................................. 0.1 + hidden_size ..................................... 11600 + hysteresis ...................................... 2 + ict_head_size ................................... None + ict_load ........................................ None + img_dim ......................................... 224 + indexer_batch_size .............................. 128 + indexer_log_interval ............................ 1000 + init_method_std ................................. 0.02 + init_method_xavier_uniform ...................... False + initial_loss_scale .............................. 4294967296 + kv_channels ..................................... 145 + layernorm_epsilon ............................... 1e-05 + lazy_mpu_init ................................... None + load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + local_rank ...................................... 0 + log_batch_size_to_tensorboard ................... True + log_interval .................................... 1 + log_learning_rate_to_tensorboard ................ True + log_loss_scale_to_tensorboard ................... True + log_num_zeros_in_grad ........................... False + log_params_norm ................................. False + log_timers_to_tensorboard ....................... True + log_validation_ppl_to_tensorboard ............... True + loss_on_targets_only ............................ False + loss_scale ...................................... 12.0 + loss_scale_window ............................... 1000 + lr .............................................. 6e-05 + lr_decay_iters .................................. None + lr_decay_samples ................................ None + lr_decay_style .................................. cosine + lr_decay_tokens ................................. 260000000000 + lr_warmup_fraction .............................. None + lr_warmup_iters ................................. 0 + lr_warmup_samples ............................... 216320 + make_vocab_size_divisible_by .................... 128 + mask_prob ....................................... 0.15 + masked_softmax_fusion ........................... True + max_position_embeddings ......................... 2048 + memory_centric_tiled_linear ..................... False + merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt + micro_batch_size ................................ 1 + min_loss_scale .................................. 1.0 + min_lr .......................................... 6e-06 + mmap_warmup ..................................... False + no_load_optim ................................... None + no_load_rng ..................................... None + no_save_optim ................................... None + no_save_rng ..................................... None + num_attention_heads ............................. 80 + num_channels .................................... 3 + num_classes ..................................... 1000 + num_layers ...................................... 64 + num_layers_per_virtual_pipeline_stage ........... None + num_workers ..................................... 2 + onnx_safe ....................................... None + openai_gelu ..................................... False + optimizer ....................................... adam + override_lr_scheduler ........................... False + params_dtype .................................... torch.float16 + partition_activations ........................... False + patch_dim ....................................... 16 + pipeline_model_parallel_size .................... 32 + position_embedding_type ......................... PositionEmbeddingType.absolute + profile_backward ................................ False + query_in_block_prob ............................. 0.1 + rampup_batch_size ............................... None + rank ............................................ 0 + remote_device ................................... none + reset_attention_mask ............................ False + reset_position_ids .............................. False + retriever_report_topk_accuracies ................ [] + retriever_score_scaling ......................... False + retriever_seq_length ............................ 256 + sample_rate ..................................... 1.0 + save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + save_interval ................................... 300 + scatter_gather_tensors_in_pipeline .............. True + scattered_embeddings ............................ False + seed ............................................ 43 + seq_length ...................................... 2048 + sgd_momentum .................................... 0.9 + short_seq_prob .................................. 0.1 + split ........................................... 949,50,1 + split_transformers .............................. False + synchronize_each_layer .......................... False + tensor_model_parallel_size ...................... 4 + tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard + tensorboard_log_interval ........................ 1 + tensorboard_queue_size .......................... 5 + tile_factor ..................................... 1 + titles_data_path ................................ None + tokenizer_name_or_path .......................... None + tokenizer_type .................................. GPT2BPETokenizer + train_iters ..................................... None + train_samples ................................... 600000000 + train_tokens .................................... 300000000000 + use_checkpoint_lr_scheduler ..................... False + use_contiguous_buffers_in_ddp ................... False + use_cpu_initialization .......................... None + use_one_sent_docs ............................... False + use_pin_memory .................................. False + virtual_pipeline_model_parallel_size ............ None + vocab_extra_ids ................................. 0 + vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json + weight_decay .................................... 0.1 + world_size ...................................... 128 + zero_allgather_bucket_size ...................... 0.0 + zero_contigious_gradients ....................... False + zero_reduce_bucket_size ......................... 0.0 + zero_reduce_scatter ............................. False + zero_stage ...................................... 1 +-------------------- end of arguments --------------------- +setting number of micro-batches to constant 2048 +> building GPT2BPETokenizer tokenizer ... + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES]transformer_inference ........ [OKAY][NO] + ....... [OKAY] +quantizer .............. [NO] .......utils [OKAY].................. + [YES] ...... [OKAY]-------------------------------------------------- + +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) +> initializing torch distributed ... +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version .....................DeepSpeed general environment info: 11.2 + +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']torch install path + deepspeed info............... ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +/bin/sh: line 0: type: git: not found +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY] +....... [OKAY] +utils ..................utils [YES].................. ......[YES] [OKAY]...... + [OKAY] +quantizer .............. quantizer[NO] ..................... [NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path + ............... torch install path ...............['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version ....................['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +1.8.1 +torch cuda versiontorch version ................................... 11.11.8.1 + +nvcc version torch cuda version..................... ...............11.2 +11.1deepspeed install path + nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']11.2 + +deepspeed infodeepspeed install path ................... ...........0.5.5+cd7967d, cd7967d, master +['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']deepspeed wheel compiled w. + ......deepspeed info torch 1.8, cuda 11.1................... + 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +> setting tensorboard ... +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +> initializing tensor model parallel with size 4 +> initializing pipeline model parallel with size 32 +> setting random seeds to 43 ... +[2021-10-10 10:44:52,568] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 +> compiling dataset index builder ... +make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/data' +make: Nothing to be done for 'default'. +make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/data' +>>> done with dataset index builder. Compilation time: 0.301 seconds +> compiling and loading fused kernels ... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/build/build.ninja... +Building extension module scaled_upper_triang_masked_softmax_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +ninja: no work to do. +Loading extension module scaled_upper_triang_masked_softmax_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/build/build.ninja... +Building extension module scaled_masked_softmax_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +ninja: no work to do. +Loading extension module scaled_masked_softmax_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/build/build.ninja... +Building extension module fused_mix_prec_layer_norm_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +ninja: no work to do. +Loading extension module fused_mix_prec_layer_norm_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +>>> done with compiling and loading fused kernels. Compilation time: 17.799 seconds +time to initialize megatron (seconds): 22.805 +[after megatron is initialized] datetime: 2021-10-10 10:45:10 +building GPT model ... +[2021-10-10 10:45:10,856] [INFO] [utils.py:806:see_memory_usage] Before Building Model +[2021-10-10 10:45:10,857] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +[2021-10-10 10:45:10,857] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.02 GB, percent = 20.3% +SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} +[2021-10-10 10:45:12,527] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer +stage=0 layers=5 + 0: _to_float16 + 1: EmbeddingPipe + 2: + 3: ParallelTransformerLayerPipe + 4: ParallelTransformerLayerPipe +stage=1 layers=2 + 5: ParallelTransformerLayerPipe + 6: ParallelTransformerLayerPipe +stage=2 layers=2 + 7: ParallelTransformerLayerPipe + 8: ParallelTransformerLayerPipe +stage=3 layers=2 + 9: ParallelTransformerLayerPipe + 10: ParallelTransformerLayerPipe +stage=4 layers=2 + 11: ParallelTransformerLayerPipe + 12: ParallelTransformerLayerPipe +stage=5 layers=2 + 13: ParallelTransformerLayerPipe + 14: ParallelTransformerLayerPipe +stage=6 layers=2 + 15: ParallelTransformerLayerPipe + 16: ParallelTransformerLayerPipe +stage=7 layers=2 + 17: ParallelTransformerLayerPipe + 18: ParallelTransformerLayerPipe +stage=8 layers=2 + 19: ParallelTransformerLayerPipe + 20: ParallelTransformerLayerPipe +stage=9 layers=2 + 21: ParallelTransformerLayerPipe + 22: ParallelTransformerLayerPipe +stage=10 layers=2 + 23: ParallelTransformerLayerPipe + 24: ParallelTransformerLayerPipe +stage=11 layers=2 + 25: ParallelTransformerLayerPipe + 26: ParallelTransformerLayerPipe +stage=12 layers=2 + 27: ParallelTransformerLayerPipe + 28: ParallelTransformerLayerPipe +stage=13 layers=2 + 29: ParallelTransformerLayerPipe + 30: ParallelTransformerLayerPipe +stage=14 layers=2 + 31: ParallelTransformerLayerPipe + 32: ParallelTransformerLayerPipe +stage=15 layers=2 + 33: ParallelTransformerLayerPipe + 34: ParallelTransformerLayerPipe +stage=16 layers=2 + 35: ParallelTransformerLayerPipe + 36: ParallelTransformerLayerPipe +stage=17 layers=2 + 37: ParallelTransformerLayerPipe + 38: ParallelTransformerLayerPipe +stage=18 layers=2 + 39: ParallelTransformerLayerPipe + 40: ParallelTransformerLayerPipe +stage=19 layers=2 + 41: ParallelTransformerLayerPipe + 42: ParallelTransformerLayerPipe +stage=20 layers=2 + 43: ParallelTransformerLayerPipe + 44: ParallelTransformerLayerPipe +stage=21 layers=2 + 45: ParallelTransformerLayerPipe + 46: ParallelTransformerLayerPipe +stage=22 layers=2 + 47: ParallelTransformerLayerPipe + 48: ParallelTransformerLayerPipe +stage=23 layers=2 + 49: ParallelTransformerLayerPipe + 50: ParallelTransformerLayerPipe +stage=24 layers=2 + 51: ParallelTransformerLayerPipe + 52: ParallelTransformerLayerPipe +stage=25 layers=2 + 53: ParallelTransformerLayerPipe + 54: ParallelTransformerLayerPipe +stage=26 layers=2 + 55: ParallelTransformerLayerPipe + 56: ParallelTransformerLayerPipe +stage=27 layers=2 + 57: ParallelTransformerLayerPipe + 58: ParallelTransformerLayerPipe +stage=28 layers=2 + 59: ParallelTransformerLayerPipe + 60: ParallelTransformerLayerPipe +stage=29 layers=2 + 61: ParallelTransformerLayerPipe + 62: ParallelTransformerLayerPipe +stage=30 layers=2 + 63: ParallelTransformerLayerPipe + 64: ParallelTransformerLayerPipe +stage=31 layers=6 + 65: ParallelTransformerLayerPipe + 66: ParallelTransformerLayerPipe + 67: + 68: MixedFusedLayerNorm + 69: EmbeddingPipe + 70: float16_to_fp32 + loss: CrossEntropy + > number of parameters on (tensor, pipeline) model parallel rank (3, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 24): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 24): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (3, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 978291800 +[2021-10-10 10:45:13,240] [INFO] [utils.py:806:see_memory_usage] After Building Model +[2021-10-10 10:45:13,241] [INFO] [utils.py:807:see_memory_usage] MA 1.88 GB Max_MA 1.9 GB CA 1.91 GB Max_CA 2 GB +[2021-10-10 10:45:13,241] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.21 GB, percent = 20.4% + > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 978291800 +setting training iterations to 292968 +> learning rate decay style: cosine +DeepSpeed is enabled. +[2021-10-10 10:45:13,242] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+cd7967d, git-hash=cd7967d, git-branch=master +[2021-10-10 10:45:13,279] [INFO] [engine.py:204:__init__] DeepSpeed Flops Profiler Enabled: False +[2021-10-10 10:45:13,279] [INFO] [engine.py:848:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer +[2021-10-10 10:45:13,279] [INFO] [engine.py:854:_configure_optimizer] Using client Optimizer as basic optimizer +[2021-10-10 10:45:13,279] [INFO] [engine.py:870:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam +[2021-10-10 10:45:13,280] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= +[2021-10-10 10:45:13,280] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer +[2021-10-10 10:45:13,280] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 +[2021-10-10 10:45:13,280] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 +[2021-10-10 10:45:13,280] [INFO] [stage2.py:113:__init__] CPU Offload: False +[2021-10-10 10:45:13,280] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False +Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 2 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 124 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 127 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 0 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 1 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 125 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 3 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 126 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +[2021-10-10 10:45:15,114] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states +[2021-10-10 10:45:15,114] [INFO] [utils.py:807:see_memory_usage] MA 5.48 GB Max_MA 7.3 GB CA 9.25 GB Max_CA 9 GB +[2021-10-10 10:45:15,115] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.23 GB, percent = 20.4% +[2021-10-10 10:45:15,167] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states +[2021-10-10 10:45:15,168] [INFO] [utils.py:807:see_memory_usage] MA 12.77 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB +[2021-10-10 10:45:15,168] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.23 GB, percent = 20.4% +[2021-10-10 10:45:15,168] [INFO] [stage2.py:474:__init__] optimizer state initialized +[2021-10-10 10:45:15,203] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer +[2021-10-10 10:45:15,204] [INFO] [utils.py:807:see_memory_usage] MA 12.77 GB Max_MA 12.77 GB CA 20.19 GB Max_CA 20 GB +[2021-10-10 10:45:15,204] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.23 GB, percent = 20.4% +[2021-10-10 10:45:15,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +[2021-10-10 10:45:15,204] [INFO] [engine.py:596:_configure_lr_scheduler] DeepSpeed using client LR scheduler +[2021-10-10 10:45:15,204] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +[2021-10-10 10:45:15,204] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] +[2021-10-10 10:45:15,204] [INFO] [config.py:940:print] DeepSpeedEngine configuration: +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] amp_enabled .................. False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] amp_params ................... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] curriculum_enabled ........... True +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] dataloader_drop_last ......... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] disable_allgather ............ False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] dump_state ................... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_enabled ........... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] eigenvalue_verbose ........... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] elasticity_enabled ........... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] flops_profiler_config ........ { + "enabled": false, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] fp16_enabled ................. True +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False +[2021-10-10 10:45:15,205] [INFO] [config.py:944:print] global_rank .................. 0 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] gradient_accumulation_steps .. 2048 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] loss_scale ................... 0 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] memory_breakdown ............. False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] optimizer_name ............... None +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] optimizer_params ............. None +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] pld_enabled .................. False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] pld_params ................... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] prescale_gradients ........... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_groups .............. 1 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_offset .............. 1000 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_period .............. 1000 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_rounding ............ 0 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_start_bits .......... 16 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_target_bits ......... 8 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_training_enabled .... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_type ................ 0 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] quantize_verbose ............. False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] scheduler_name ............... None +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] scheduler_params ............. None +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] sparse_attention ............. None +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] steps_per_print .............. 2000 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] tensorboard_enabled .......... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] tensorboard_output_path ...... +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] train_batch_size ............. 2048 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] use_quantizer_kernel ......... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] wall_clock_breakdown ......... False +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] world_size ................... 1 +[2021-10-10 10:45:15,206] [INFO] [config.py:944:print] zero_allow_untested_optimizer False +[2021-10-10 10:45:15,207] [INFO] [config.py:944:print] zero_config .................. { + "stage": 1, + "contiguous_gradients": true, + "reduce_scatter": true, + "reduce_bucket_size": 5.000000e+08, + "allgather_partitions": true, + "allgather_bucket_size": 5.000000e+08, + "overlap_comm": false, + "load_from_fp32_weights": true, + "elastic_checkpoint": true, + "offload_param": null, + "offload_optimizer": null, + "sub_group_size": 1.000000e+09, + "prefetch_bucket_size": 5.000000e+07, + "param_persistence_threshold": 1.000000e+05, + "max_live_parameters": 1.000000e+09, + "max_reuse_distance": 1.000000e+09, + "gather_fp16_weights_on_model_save": false, + "ignore_unused_parameters": true, + "round_robin_gradients": false, + "legacy_stage1": false +} +[2021-10-10 10:45:15,207] [INFO] [config.py:944:print] zero_enabled ................. True +[2021-10-10 10:45:15,207] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 +[2021-10-10 10:45:15,207] [INFO] [config.py:946:print] json = { + "train_micro_batch_size_per_gpu": 1, + "train_batch_size": 2.048000e+03, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": 1 + }, + "fp16": { + "enabled": true, + "loss_scale": 0, + "loss_scale_window": 500, + "hysteresis": 2, + "min_loss_scale": 1, + "initial_scale_power": 12 + }, + "curriculum_learning": { + "enabled": true, + "curriculum_type": "seqlen", + "min_difficulty": 64, + "max_difficulty": 2.048000e+03, + "schedule_type": "fixed_linear", + "schedule_config": { + "total_curriculum_step": 3.600000e+04, + "difficulty_step": 8 + } + }, + "steps_per_print": 2.000000e+03, + "wall_clock_breakdown": false +} +[2021-10-10 10:45:15,207] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,595] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + will not load any checkpoints and will start from random +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,682] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,683] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,684] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,684] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,684] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 10:45:15,684] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +time (ms) | load-checkpoint: 3.77 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.2213504 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 125.2213504 +estimated model parameters: 125.2213504 +estimated model parameters: 103.3650944 +estimated model parameters: 125.2213504 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432estimated model parameters: 125.22432estimated model parameters: 125.22432 + + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.368064estimated model parameters without embeddings: 103.368064 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.368064estimated model parameters without embeddings: 103.368064 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") + +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-10 10:45:15 +> building train, validation, and test datasets ... + > datasets target sizes (minimum size): + train: 600000000 + validation: 3000320 + test: 10240 +> building train, validation, and test datasets for GPT ... + > building dataset index ... + reading sizes... + reading pointers... + reading document index... + creating numpy buffer of mmap... + creating memory view of numpy buffer... + > finished creating indexed dataset in 0.140790 seconds + number of documents: 304230423 + > dataset split: + train: + document indices in [0, 288714672) total of 288714672 documents + validation: + document indices in [288714672, 303926193) total of 15211521 documents + test: + document indices in [303926193, 304230423) total of 304230 documents + > WARNING: could not find index map files, building the indices on rank 0 ... + > last epoch number of samples (73851107) is smaller than 80% of number of samples per epoch (131537223), setting separate_last_epoch to True + > elasped time to build and save doc-idx mapping (seconds): 126.075045 + using: + number of documents: 288714672 + number of epochs: 5 + sequence length: 2048 + total number of samples: 657686116 + > elasped time to build and save sample-idx mapping (seconds): 37.277918 + > building shuffle index with split [0, 526148893) and [526148893, 657686116) ... + > elasped time to build and save shuffle-idx mapping (seconds): 44.179906 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.107 seconds + total number of samples: 657686117 + total number of epochs: 5 + > WARNING: could not find index map files, building the indices on rank 0 ... + > only one epoch required, setting separate_last_epoch to False + > elasped time to build and save doc-idx mapping (seconds): 1.007942 + using: + number of documents: 15211521 + number of epochs: 1 + sequence length: 2048 + total number of samples: 6927160 + > elasped time to build and save sample-idx mapping (seconds): 0.383493 + > building shuffle index with split [0, 6927160) and [6927160, 6927160) ... + > elasped time to build and save shuffle-idx mapping (seconds): 0.321055 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.043 seconds + total number of samples: 6927161 + total number of epochs: 1 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.034 seconds + total number of samples: 137384 + total number of epochs: 1 +> finished creating GPT datasets ... +[after dataloaders are built] datetime: 2021-10-10 10:48:50 +done with setup ... +training ... +time (ms) | model-and-optimizer-setup: 4882.59 | train/valid/test-data-iterators-setup: 213714.07 +Number of parameters: 125.2213504 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.22432 billionNumber of parameters: 125.22432 billion + +Number of parameters: 125.22432 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billionNumber of parameters without embeddings: 103.368064 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 125.2213504 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +[before the start of training step] datetime: 2021-10-10 10:48:50 +[2021-10-10 10:48:50,114] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information +[2021-10-10 10:48:50,114] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False +[2021-10-10 10:48:50,115] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers +[2021-10-10 10:48:50,115] [INFO] [checkpointing.py:554:forward] ----Synchronization False +[2021-10-10 10:48:50,115] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False +Traceback (most recent call last): +Traceback (most recent call last): +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train +iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train +iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train +iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train + train_step(forward_step_func, +train_step(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + loss = model[0].train_batch(data_iter=data_iterator)loss = model[0].train_batch(data_iter=data_iterator) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + train_step(forward_step_func,train_step(forward_step_func, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + loss = model[0].train_batch(data_iter=data_iterator) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + loss = model[0].train_batch(data_iter=data_iterator) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + self._exec_schedule(sched) +self._exec_schedule(sched) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + self._exec_schedule(sched) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + self._exec_schedule(sched) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + self._exec_instr(**cmd.kwargs)self._exec_instr(**cmd.kwargs) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + self._exec_instr(**cmd.kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + self._exec_instr(**cmd.kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + loss = self.module(*inputs, **kwargs) +loss = self.module(*inputs, **kwargs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + loss = self.module(*inputs, **kwargs) File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + loss = self.module(*inputs, **kwargs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + result = self.forward(*input, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + +result = self.forward(*input, **kwargs) +result = self.forward(*input, **kwargs) File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + x = self.activation_checkpoint_func( + x = self.activation_checkpoint_func( File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + x = self.activation_checkpoint_func(x = self.activation_checkpoint_func( + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward +self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward +result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + attention_probs = self.scale_mask_softmax(attention_scores,attention_probs = self.scale_mask_softmax(attention_scores, + + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward + attention_probs = self.scale_mask_softmax(attention_scores, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs)result = self.forward(*input, **kwargs) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 146, in forward + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 146, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward + attention_probs = self.scale_mask_softmax(attention_scores, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 146, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 146, in forward + probs = ScaledUpperTriangMaskedSoftmax.apply(input, scale) +probs = ScaledUpperTriangMaskedSoftmax.apply(input, scale) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 34, in forward + +probs = ScaledUpperTriangMaskedSoftmax.apply(input, scale) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 34, in forward + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 34, in forward + probs = ScaledUpperTriangMaskedSoftmax.apply(input, scale) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 34, in forward + softmax_results = scaled_upper_triang_masked_softmax_cuda.forward( + softmax_results = scaled_upper_triang_masked_softmax_cuda.forward( +RuntimeError: attn_batches % batches_per_block == 0 INTERNAL ASSERT FAILED at "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h":363, please report a bug to PyTorch. RuntimeError +: attn_batches % batches_per_block == 0 INTERNAL ASSERT FAILED at "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h":363, please report a bug to PyTorch. + softmax_results = scaled_upper_triang_masked_softmax_cuda.forward( + softmax_results = scaled_upper_triang_masked_softmax_cuda.forward(RuntimeError +: attn_batches % batches_per_block == 0 INTERNAL ASSERT FAILED at "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h":363, please report a bug to PyTorch. +RuntimeError: attn_batches % batches_per_block == 0 INTERNAL ASSERT FAILED at "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h":363, please report a bug to PyTorch. +Killing subprocess 1304934 +Killing subprocess 1304935 +Killing subprocess 1304936 +Killing subprocess 1304937 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1504412.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +srun: error: r7i4n4: task 0: Exited with exit code 1 +srun: Terminating job step 1504412.0 +Killing subprocess 580951 +Killing subprocess 577240 +Killing subprocess 579564 +Killing subprocess 577241 +Killing subprocess 580952 +Killing subprocess 579565 +Killing subprocess 579566 +Killing subprocess 577242 +Killing subprocess 580953 +Killing subprocess 577243 +Killing subprocess 579811 +Main process received SIGTERM, exiting +Killing subprocess 580571 +Killing subprocess 579812 +Killing subprocess 579567 +Main process received SIGTERM, exiting +Killing subprocess 580572 +Killing subprocess 584977 +Killing subprocess 580955 +Main process received SIGTERM, exiting +Killing subprocess 580161 +Killing subprocess 579784 +Killing subprocess 584978 +Killing subprocess 579813 +Killing subprocess 580573 +Killing subprocess 579814 +Killing subprocess 580575 +Killing subprocess 787351 +Main process received SIGTERM, exiting +Killing subprocess 580162 +Killing subprocess 216514 +Killing subprocess 584979 +Main process received SIGTERM, exiting +Killing subprocess 579785 +Killing subprocess 787352 +Killing subprocess 641762 +Killing subprocess 584980 +Killing subprocess 590013 +Killing subprocess 787353 +Killing subprocess 580163 +Killing subprocess 216515 +Killing subprocess 580164 +Killing subprocess 641763 +Killing subprocess 216516 +Killing subprocess 787354 +Killing subprocess 590396 +Killing subprocess 579786 +Killing subprocess 579787 +Killing subprocess 174114 +Killing subprocess 641764 +Killing subprocess 590014 +Killing subprocess 216517 +Main process received SIGTERM, exiting +Killing subprocess 175003 +Main process received SIGTERM, exiting +Killing subprocess 174084 +Killing subprocess 590527 +Killing subprocess 590397 +Killing subprocess 641765 +Killing subprocess 174115 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 590015 +Killing subprocess 590016 +Killing subprocess 174085 +Main process received SIGTERM, exiting +Killing subprocess 175004 +Killing subprocess 590398 +Killing subprocess 175005 +Killing subprocess 590528 +Killing subprocess 590399 +Killing subprocess 174116 +Killing subprocess 174117 +Main process received SIGTERM, exiting +Killing subprocess 174086 +Killing subprocess 175006 +Killing subprocess 174228 +Killing subprocess 174974 +Main process received SIGTERM, exiting +Killing subprocess 590529 +Killing subprocess 590530 +Killing subprocess 173925 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 174975 +Killing subprocess 174087 +Main process received SIGTERM, exiting +Killing subprocess 174229 +Main process received SIGTERM, exiting +Killing subprocess 173926 +Killing subprocess 173939 +Killing subprocess 174976 +Killing subprocess 174230 +Killing subprocess 174977 +Killing subprocess 174231 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 174134 +Killing subprocess 173927 +Killing subprocess 173928 +Main process received SIGTERM, exiting +Killing subprocess 174521 +Killing subprocess 173746 +Killing subprocess 173940 +Killing subprocess 174135 +Killing subprocess 174458 +Killing subprocess 174522 +Killing subprocess 174075 +Killing subprocess 174523 +Killing subprocess 173747 +Killing subprocess 173941 +Main process received SIGTERM, exiting +Killing subprocess 174459 +Killing subprocess 173943 +Killing subprocess 174136 +Killing subprocess 173181 +Killing subprocess 174137 +Killing subprocess 173748 +Killing subprocess 172765 +Killing subprocess 173748 +Killing subprocess 173141 +Killing subprocess 174524 +Killing subprocess 174076 +Killing subprocess 173749 +Killing subprocess 174460 +Killing subprocess 174077 +Killing subprocess 173182 +Killing subprocess 172766 +Killing subprocess 174461 +Killing subprocess 173142 +Killing subprocess 173183 +Killing subprocess 172767 +Killing subprocess 173749 +Main process received SIGTERM, exiting +Killing subprocess 173750 +Killing subprocess 173143 +Killing subprocess 173184 +Killing subprocess 172768 +Main process received SIGTERM, exiting +Killing subprocess 173751 +Killing subprocess 173144 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 174079 +Killing subprocess 173760 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 173761 +Main process received SIGTERM, exiting +Killing subprocess 173762 +Killing subprocess 173763 +Main process received SIGTERM, exiting +srun: error: r7i6n0: task 14: Exited with exit code 1 +srun: error: r7i5n7: task 12: Exited with exit code 1 +srun: error: r7i5n8: task 13: Exited with exit code 1 +srun: error: r9i6n5: task 19: Exited with exit code 1 +srun: error: r9i6n2: task 16: Exited with exit code 1 +srun: error: r9i7n2: task 25: Exited with exit code 1 +srun: error: r9i6n1: task 15: Exited with exit code 1 +srun: error: r9i6n6: task 20: Exited with exit code 1 +srun: error: r9i6n7: task 21: Exited with exit code 1 +srun: error: r7i5n0: task 5: Exited with exit code 1 +srun: error: r9i6n3: task 17: Exited with exit code 1 +srun: error: r9i7n4: task 27: Exited with exit code 1 +srun: error: r9i7n0: task 23: Exited with exit code 1 +srun: error: r7i4n6: task 2: Exited with exit code 1 +srun: error: r9i6n4: task 18: Exited with exit code 1 +srun: error: r7i4n8: task 4: Exited with exit code 1 +srun: error: r7i5n1: task 6: Exited with exit code 1 +srun: error: r9i7n5: task 28: Exited with exit code 1 +srun: error: r9i6n8: task 22: Exited with exit code 1 +srun: error: r7i5n4: task 9: Exited with exit code 1 +srun: error: r7i5n6: task 11: Exited with exit code 1 +srun: error: r9i7n1: task 24: Exited with exit code 1 +srun: error: r7i4n7: task 3: Exited with exit code 1 +srun: error: r7i5n5: task 10: Exited with exit code 1 +srun: error: r7i5n3: task 8: Exited with exit code 1 +srun: error: r9i7n3: task 26: Exited with exit code 1 +srun: error: r9i7n8: task 31: Exited with exit code 1 +srun: error: r7i5n2: task 7: Exited with exit code 1 +srun: error: r9i7n7: task 30: Exited with exit code 1 +srun: error: r9i7n6: task 29: Exited with exit code 1 +srun: error: r7i4n5: task 1: Exited with exit code 1 +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja ..................-------------------------------------------------- [OKAY] + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +-------------------------------------------------- +op nameNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +................-------------------------------------------------- +installedJIT compiled ops requires ninja +.. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +JIT compiled ops requires ninja +cpu_adam ............... [YES] ...... [OKAY] +ninja .................. [OKAY] +fused_adam ............. [NO] ....... [OKAY] +-------------------------------------------------- +fused_lamb ............. [NO] ....... [OKAY] +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +-------------------------------------------------- +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +op name ................ installed .. compatible +-------------------------------------------------- +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +cpu_adamninja .................. [OKAY] +-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +op name ................ ...............installed .. compatible +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +[YES] ...... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_adamfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] + +fused_lamb ............. [NO] ....... [OKAY] +sparse_attnsparse_attn ............ [NO] ....... ............[OKAY] +[NO] .......transformer [OKAY]............ + [NO] transformer....... ............[OKAY] +[NO] ....... [OKAY]stochastic_transformer + . [NO]stochastic_transformer ....... [OKAY]. + [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +ninja .................. [OKAY] +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +ninja .................. [OKAY] +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +-------------------------------------------------- +op name ................ installed .. compatible +op name ................ installed .. compatible +-------------------------------------------------- +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. --------------------------------------------------[OKAY] + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +--------------------------------------------------op name + NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op................. + --------------------------------------------------installed + JIT compiled ops requires ninja.. + compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +--------------------------------------------------fused_adam +............. DeepSpeed C++/CUDA extension op report[NO] + .......-------------------------------------------------- +[OKAY] +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +ninja .................. [OKAY] +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +transformer ............ [NO] ....... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] --------------------------------------------------....... [OKAY] + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +op name ................ installed .. compatible +-------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adamninja ............... ..................[YES] ......[OKAY] [OKAY] + +-------------------------------------------------- +op name ................ installedfused_adam ............... compatible[NO] +.......-------------------------------------------------- +[OKAY] +fused_lamb ............. [NO] cpu_adam....... [OKAY]............... + [YES] ...... [OKAY] +sparse_attn ............ fused_adam[NO] .................... [NO][OKAY] + .......transformer [OKAY]............ +[NO] ....... [OKAY]fused_lamb + .............stochastic_transformer [NO] ........ [NO][OKAY] +....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam-------------------------------------------------- ............. +[NO] DeepSpeed C++/CUDA extension op report....... + [OKAY]-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +fused_lamb-------------------------------------------------- +.............JIT compiled ops requires ninja +[NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +/bin/sh: line 0: type: git: not found +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO]-------------------------------------------------- +DeepSpeed C++/CUDA extension op report....... + [OKAY]-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_ioutils ................................. [NO][YES] ............. [NO][OKAY] + +quantizer .............. [NO] ....... [OKAY] +transformer_inference .. [NO]-------------------------------------------------- +....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer_inference .. [NO] ....... [OKAY] +utils ..................  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.[YES] ...... +[OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... async_io[NO] ...................... [NO] +[NO] ....... [NO] +transformer_inference .. [NO] .......transformer_inference [OKAY].. + [NO] ....... utils[OKAY] +.................. [YES] ...... [OKAY] +utils .................. [YES]quantizer .................... [OKAY][NO] + ....... [OKAY] +quantizer ..............-------------------------------------------------- +[NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io: please install the libaio-devel package with yum +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +DeepSpeed general environment info: +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch version .................... 1.8.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +async_io ............... async_io[NO] ...................... [NO][NO] +....... [NO] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +utils utils.................. ..................[YES] [YES]...... ......[OKAY] +[OKAY] +quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +async_io ............... [NO] ....... [NO] +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io async_io............... [NO]............... .......[NO] [NO]....... + [NO] +transformer_inferencetransformer_inference .... [NO][NO] ....... .......[OKAY] +[OKAY] +utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] + +quantizerquantizer ............................ [NO][NO] .............. [OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... + [OKAY] +quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] +[OKAY] +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +/bin/sh: line 0: type: git: not found +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: + [WARNING]  async_io: please install the libaio-devel package with yum +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io: please install the libaio-devel package with yum +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +utils .................. [YES] ...... [OKAY] +async_io ............... [NO] ....... [NO] +quantizer .............. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils ..................utils [YES].................. ......[YES] [OKAY]...... + [OKAY] +quantizer .............. quantizer[NO] ..................... [NO][OKAY] +....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io [WARNING]  async_io: please install the libaio-devel package with yum ............... [NO] + ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install pathDeepSpeed general environment info: ............... +torch install path['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +............... torch version .................... 1.8.1 +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch cuda version ...............torch version 11.1.................... + nvcc version1.8.1 +..................... torch cuda version11.2 +...............deepspeed install path 11.1........... + nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']..................... + deepspeed info11.2 +...................deepspeed install path 0.5.5+cd7967d, cd7967d, master........... + deepspeed wheel compiled w. ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']...... + deepspeed infotorch 1.8, cuda 11.1 +................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + [WARNING]  async_io: please install the libaio-devel package with yum +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']DeepSpeed general environment info: +deepspeed info +................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w.torch install path ..................... torch 1.8, cuda 11.1 +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + + [WARNING]  async_io: please install the libaio-devel package with yum +torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']torch version + .................... torch version1.8.1 +.................... torch cuda version1.8.1 +............... torch cuda version11.1 +...............nvcc version 11.1..................... + nvcc version11.2 +.....................deepspeed install path 11.2........... + deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']........... + deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +................... deepspeed info0.5.5+cd7967d, cd7967d, master +................... deepspeed wheel compiled w.0.5.5+cd7967d, cd7967d, master +......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... + torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io: please install the libaio-devel package with yum +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference ..async_io [NO]............... .......[NO] [OKAY]....... + [NO] +utils .................. [YES] ...... [OKAY] +transformer_inference .. [NO]quantizer ..................... [OKAY][NO] +DeepSpeed general environment info: + ....... [OKAY] +utils ..................-------------------------------------------------- +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +[YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + ............... [NO] ....... [NO] +async_io ............... [NO] transformer_inference....... ..[NO] +[NO] ....... [OKAY] +utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] + +quantizer ..............utils [NO].................. .......[YES] [OKAY]...... + [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +DeepSpeed general environment info: +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 +torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils utils.................. ..................[YES] [YES]...... ......[OKAY] +[OKAY] +quantizer .............. quantizer[NO] .............. .......[NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... DeepSpeed general environment info: +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version torch install path.................... ...............1.8.1 +torch cuda version ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']11.1 + +nvcc version torch version..................... ....................11.2 +1.8.1deepspeed install path + ........... torch cuda version ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']............... + deepspeed info11.1 +...................nvcc version 0.5.5+cd7967d, cd7967d, master..................... + deepspeed wheel compiled w.11.2 +......deepspeed install path torch 1.8, cuda 11.1........... + ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +DeepSpeed general environment info: +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +quantizer .............. [NO] ....... [OKAY] +torch cuda version ............... 11.1 +-------------------------------------------------- +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +/bin/sh: line 0: type: git: not found +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +/bin/sh: line 0: type: git: not found +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io ...............utils [NO] ......................... [YES][NO] +...... [OKAY] +quantizer .............. [NO] ....... transformer_inference[OKAY] +.. [NO] ....... [OKAY] +-------------------------------------------------- +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +DeepSpeed general environment info: +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +DeepSpeed general environment info: +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +nvcc version ..................... 11.2 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path +............... torch install path ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']............... + torch version .................... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']1.8.1 + +torch cuda versiontorch version ................................... 11.11.8.1 + +nvcc version torch cuda version..................... ...............11.2 +11.1deepspeed install path + nvcc version........... ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']11.2 + +deepspeed infodeepspeed install path .............................. 0.5.5+cd7967d, cd7967d, master +['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']deepspeed wheel compiled w. + deepspeed info...... ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +/bin/sh: line 0: type: git: not found +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... DeepSpeed general environment info: +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch versiontorch install path .................... ...............1.8.1 +torch cuda version ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']11.1 + +nvcc version torch version..................... ....................11.2 +1.8.1 +deepspeed install path ...........torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +11.1deepspeed info + nvcc version................... .....................0.5.5+cd7967d, cd7967d, master +11.2 +deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 +using torch.float16 for parameters ... +------------------------ arguments ------------------------ + [WARNING]  async_io: please install the libaio-devel package with yum + accumulate_allreduce_grads_in_fp32 .............. False + adam_beta1 ...................................... 0.9 + adam_beta2 ...................................... 0.95 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + adam_eps ........................................ 1e-08 + adlr_autoresume ................................. False + adlr_autoresume_interval ........................ 1000 + apply_query_key_layer_scaling ................... True + apply_residual_connection_post_layernorm ........ False + attention_dropout ............................... 0.1 + attention_softmax_in_fp32 ....................... False + bert_binary_head ................................ True + bert_load ....................................... None + bf16 ............................................ False + bias_dropout_fusion ............................. True +async_io ............... [NO] ....... [NO] + bias_gelu_fusion ................................ True + biencoder_projection_dim ........................ 0 + biencoder_shared_query_context_model ............ False + block_data_path ................................. None + checkpoint_activations .......................... True + checkpoint_in_cpu ............................... False + checkpoint_num_layers ........................... 1 + clip_grad ....................................... 1.0 + codecarbon_dir .................................. None + consumed_train_samples .......................... 0 + consumed_train_tokens ........................... 0 + consumed_valid_samples .......................... 0 + contigious_checkpointing ........................ False + cpu_optimizer ................................... False + cpu_torch_adam .................................. False +transformer_inference .. [NO] ....... [OKAY] + curriculum_learning ............................. False + data_impl ....................................... mmap + data_parallel_size .............................. 1 + data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] + dataloader_type ................................. single + DDP_impl ........................................ local + decoder_seq_length .............................. None + deepscale ....................................... False + deepscale_config ................................ None + deepspeed ....................................... True + deepspeed_activation_checkpointing .............. True + deepspeed_config ................................ ./ds_config.1504567.json + deepspeed_mpi ................................... False + distribute_checkpointed_activations ............. False +utils .................. [YES] ...... [OKAY] + distributed_backend ............................. nccl + embedding_path .................................. None + encoder_seq_length .............................. 2048 + eod_mask_loss ................................... False + eval_interval ................................... 1000 + eval_iters ...................................... 5 + evidence_data_path .............................. None + exit_duration_in_mins ........................... 1190 + exit_interval ................................... None + ffn_hidden_size ................................. 46400 + finetune ........................................ False + fp16 ............................................ True + fp16_lm_cross_entropy ........................... False +quantizer .............. [NO] ....... [OKAY] + fp32_residual_connection ........................ False + gigaflos_no_embeds .............................. 0 + global_batch_size ............................... 2048 + glu_activation .................................. None + hidden_dropout .................................. 0.1 + hidden_size ..................................... 11600 + hysteresis ...................................... 2 +-------------------------------------------------- + ict_head_size ................................... None + ict_load ........................................ None + img_dim ......................................... 224 + indexer_batch_size .............................. 128 + indexer_log_interval ............................ 1000 + init_method_std ................................. 0.02 + init_method_xavier_uniform ...................... False + initial_loss_scale .............................. 4294967296 + kv_channels ..................................... 145 + layernorm_epsilon ............................... 1e-05 + lazy_mpu_init ................................... None + load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + local_rank ...................................... 0 + log_batch_size_to_tensorboard ................... True + log_interval .................................... 1 + log_learning_rate_to_tensorboard ................ True + log_loss_scale_to_tensorboard ................... True + log_num_zeros_in_grad ........................... False + log_params_norm ................................. False + log_timers_to_tensorboard ....................... True + log_validation_ppl_to_tensorboard ............... True + loss_on_targets_only ............................ False + loss_scale ...................................... 12.0 + loss_scale_window ............................... 1000 + lr .............................................. 6e-05 + lr_decay_iters .................................. None + lr_decay_samples ................................ None + lr_decay_style .................................. cosine + lr_decay_tokens ................................. 260000000000 + lr_warmup_fraction .............................. None + lr_warmup_iters ................................. 0 + lr_warmup_samples ............................... 216320 + make_vocab_size_divisible_by .................... 128 + mask_prob ....................................... 0.15 + masked_softmax_fusion ........................... False + max_position_embeddings ......................... 2048 + memory_centric_tiled_linear ..................... False + merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt + micro_batch_size ................................ 1 + min_loss_scale .................................. 1.0 + min_lr .......................................... 6e-06 + mmap_warmup ..................................... False + no_load_optim ................................... None + no_load_rng ..................................... None + no_save_optim ................................... None + no_save_rng ..................................... None + num_attention_heads ............................. 80 + num_channels .................................... 3 + num_classes ..................................... 1000 + num_layers ...................................... 64 + num_layers_per_virtual_pipeline_stage ........... None + num_workers ..................................... 2 + onnx_safe ....................................... None + openai_gelu ..................................... False + optimizer ....................................... adam + override_lr_scheduler ........................... False + params_dtype .................................... torch.float16 + partition_activations ........................... False + patch_dim ....................................... 16 + pipeline_model_parallel_size .................... 32 + position_embedding_type ......................... PositionEmbeddingType.absolute + profile_backward ................................ False + query_in_block_prob ............................. 0.1 + rampup_batch_size ............................... None + rank ............................................ 0 + remote_device ................................... none + reset_attention_mask ............................ False + reset_position_ids .............................. False + retriever_report_topk_accuracies ................ [] + retriever_score_scaling ......................... False + retriever_seq_length ............................ 256 + sample_rate ..................................... 1.0 + save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + save_interval ................................... 300 + scatter_gather_tensors_in_pipeline .............. True + scattered_embeddings ............................ False + seed ............................................ 43 + seq_length ...................................... 2048 + sgd_momentum .................................... 0.9 + short_seq_prob .................................. 0.1 + split ........................................... 949,50,1 + split_transformers .............................. False + synchronize_each_layer .......................... False + tensor_model_parallel_size ...................... 4 + tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard + tensorboard_log_interval ........................ 1 + tensorboard_queue_size .......................... 5 + tile_factor ..................................... 1 + titles_data_path ................................ None + tokenizer_name_or_path .......................... None + tokenizer_type .................................. GPT2BPETokenizer + train_iters ..................................... None + train_samples ................................... 600000000 + train_tokens .................................... 300000000000 + use_checkpoint_lr_scheduler ..................... False + use_contiguous_buffers_in_ddp ................... False + use_cpu_initialization .......................... None + use_one_sent_docs ............................... False + use_pin_memory .................................. False + virtual_pipeline_model_parallel_size ............ None + vocab_extra_ids ................................. 0 + vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json + weight_decay .................................... 0.1 + world_size ...................................... 128 + zero_allgather_bucket_size ...................... 0.0 + zero_contigious_gradients ....................... False + zero_reduce_bucket_size ......................... 0.0 + zero_reduce_scatter ............................. False + zero_stage ...................................... 1 +-------------------- end of arguments --------------------- +setting number of micro-batches to constant 2048 +> building GPT2BPETokenizer tokenizer ... +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +/bin/sh: line 0: type: git: not found +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) +> initializing torch distributed ... +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +> setting tensorboard ... +> initializing tensor model parallel with size 4 +> initializing pipeline model parallel with size 32 +> setting random seeds to 43 ... +[2021-10-10 11:10:49,718] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 +> compiling dataset index builder ... +make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/data' +make: Nothing to be done for 'default'. +make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/data' +>>> done with dataset index builder. Compilation time: 0.298 seconds +WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. +> compiling and loading fused kernels ... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/fused_kernels/build/build.ninja... +Building extension module fused_mix_prec_layer_norm_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +ninja: no work to do. +Loading extension module fused_mix_prec_layer_norm_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +>>> done with compiling and loading fused kernels. Compilation time: 4.757 seconds +time to initialize megatron (seconds): 30.775 +[after megatron is initialized] datetime: 2021-10-10 11:10:54 +building GPT model ... +[2021-10-10 11:10:54,842] [INFO] [utils.py:806:see_memory_usage] Before Building Model +[2021-10-10 11:10:54,843] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +[2021-10-10 11:10:54,843] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.02 GB, percent = 20.3% +SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} +[2021-10-10 11:10:56,521] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer +stage=0 layers=5 + 0: _to_float16 + 1: EmbeddingPipe + 2: + 3: ParallelTransformerLayerPipe + 4: ParallelTransformerLayerPipe +stage=1 layers=2 + 5: ParallelTransformerLayerPipe + 6: ParallelTransformerLayerPipe +stage=2 layers=2 + 7: ParallelTransformerLayerPipe + 8: ParallelTransformerLayerPipe +stage=3 layers=2 + 9: ParallelTransformerLayerPipe + 10: ParallelTransformerLayerPipe +stage=4 layers=2 + 11: ParallelTransformerLayerPipe + 12: ParallelTransformerLayerPipe +stage=5 layers=2 + 13: ParallelTransformerLayerPipe + 14: ParallelTransformerLayerPipe +stage=6 layers=2 + 15: ParallelTransformerLayerPipe + 16: ParallelTransformerLayerPipe +stage=7 layers=2 + 17: ParallelTransformerLayerPipe + 18: ParallelTransformerLayerPipe +stage=8 layers=2 + 19: ParallelTransformerLayerPipe + 20: ParallelTransformerLayerPipe +stage=9 layers=2 + 21: ParallelTransformerLayerPipe + 22: ParallelTransformerLayerPipe +stage=10 layers=2 + 23: ParallelTransformerLayerPipe + 24: ParallelTransformerLayerPipe +stage=11 layers=2 + 25: ParallelTransformerLayerPipe + 26: ParallelTransformerLayerPipe +stage=12 layers=2 + 27: ParallelTransformerLayerPipe + 28: ParallelTransformerLayerPipe +stage=13 layers=2 + 29: ParallelTransformerLayerPipe + 30: ParallelTransformerLayerPipe +stage=14 layers=2 + 31: ParallelTransformerLayerPipe + 32: ParallelTransformerLayerPipe +stage=15 layers=2 + 33: ParallelTransformerLayerPipe + 34: ParallelTransformerLayerPipe +stage=16 layers=2 + 35: ParallelTransformerLayerPipe + 36: ParallelTransformerLayerPipe +stage=17 layers=2 + 37: ParallelTransformerLayerPipe + 38: ParallelTransformerLayerPipe +stage=18 layers=2 + 39: ParallelTransformerLayerPipe + 40: ParallelTransformerLayerPipe +stage=19 layers=2 + 41: ParallelTransformerLayerPipe + 42: ParallelTransformerLayerPipe +stage=20 layers=2 + 43: ParallelTransformerLayerPipe + 44: ParallelTransformerLayerPipe +stage=21 layers=2 + 45: ParallelTransformerLayerPipe + 46: ParallelTransformerLayerPipe +stage=22 layers=2 + 47: ParallelTransformerLayerPipe + 48: ParallelTransformerLayerPipe +stage=23 layers=2 + 49: ParallelTransformerLayerPipe + 50: ParallelTransformerLayerPipe +stage=24 layers=2 + 51: ParallelTransformerLayerPipe + 52: ParallelTransformerLayerPipe +stage=25 layers=2 + 53: ParallelTransformerLayerPipe + 54: ParallelTransformerLayerPipe +stage=26 layers=2 + 55: ParallelTransformerLayerPipe + 56: ParallelTransformerLayerPipe +stage=27 layers=2 + 57: ParallelTransformerLayerPipe + 58: ParallelTransformerLayerPipe +stage=28 layers=2 + 59: ParallelTransformerLayerPipe + 60: ParallelTransformerLayerPipe +stage=29 layers=2 + 61: ParallelTransformerLayerPipe + 62: ParallelTransformerLayerPipe +stage=30 layers=2 + 63: ParallelTransformerLayerPipe + 64: ParallelTransformerLayerPipe +stage=31 layers=6 + 65: ParallelTransformerLayerPipe + 66: ParallelTransformerLayerPipe + 67: + 68: MixedFusedLayerNorm + 69: EmbeddingPipe + 70: float16_to_fp32 + loss: CrossEntropy + > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 16): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 16): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (3, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (3, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 31): 978315000 +[2021-10-10 11:10:57,262] [INFO] [utils.py:806:see_memory_usage] After Building Model +[2021-10-10 11:10:57,263] [INFO] [utils.py:807:see_memory_usage] MA 1.88 GB Max_MA 1.9 GB CA 1.91 GB Max_CA 2 GB +[2021-10-10 11:10:57,263] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.2 GB, percent = 20.4% + > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 978291800 +setting training iterations to 292968 +> learning rate decay style: cosine +DeepSpeed is enabled. +[2021-10-10 11:10:57,264] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+cd7967d, git-hash=cd7967d, git-branch=master +[2021-10-10 11:10:57,304] [INFO] [engine.py:204:__init__] DeepSpeed Flops Profiler Enabled: False +[2021-10-10 11:10:57,304] [INFO] [engine.py:848:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer +[2021-10-10 11:10:57,304] [INFO] [engine.py:854:_configure_optimizer] Using client Optimizer as basic optimizer +[2021-10-10 11:10:57,304] [INFO] [engine.py:870:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam +[2021-10-10 11:10:57,305] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= +[2021-10-10 11:10:57,305] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer +[2021-10-10 11:10:57,305] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 +[2021-10-10 11:10:57,305] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 +[2021-10-10 11:10:57,305] [INFO] [stage2.py:113:__init__] CPU Offload: False +[2021-10-10 11:10:57,305] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False +Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 127 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 3 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 2 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 1 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 126 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 124 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 125 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 0 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +[2021-10-10 11:10:59,122] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states +[2021-10-10 11:10:59,123] [INFO] [utils.py:807:see_memory_usage] MA 5.48 GB Max_MA 7.3 GB CA 9.25 GB Max_CA 9 GB +[2021-10-10 11:10:59,123] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.22 GB, percent = 20.4% +[2021-10-10 11:10:59,169] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states +[2021-10-10 11:10:59,169] [INFO] [utils.py:807:see_memory_usage] MA 12.77 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB +[2021-10-10 11:10:59,170] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.22 GB, percent = 20.4% +[2021-10-10 11:10:59,170] [INFO] [stage2.py:474:__init__] optimizer state initialized +[2021-10-10 11:10:59,198] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer +[2021-10-10 11:10:59,199] [INFO] [utils.py:807:see_memory_usage] MA 12.77 GB Max_MA 12.77 GB CA 20.19 GB Max_CA 20 GB +[2021-10-10 11:10:59,199] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.22 GB, percent = 20.4% +[2021-10-10 11:10:59,199] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +[2021-10-10 11:10:59,199] [INFO] [engine.py:596:_configure_lr_scheduler] DeepSpeed using client LR scheduler +[2021-10-10 11:10:59,199] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +[2021-10-10 11:10:59,199] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] +[2021-10-10 11:10:59,199] [INFO] [config.py:940:print] DeepSpeedEngine configuration: +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] amp_enabled .................. False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] amp_params ................... False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] curriculum_enabled ........... True +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] dataloader_drop_last ......... False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] disable_allgather ............ False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] dump_state ................... False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_enabled ........... False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] eigenvalue_verbose ........... False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] elasticity_enabled ........... False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] flops_profiler_config ........ { + "enabled": false, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] fp16_enabled ................. True +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False +[2021-10-10 11:10:59,200] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] global_rank .................. 0 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] gradient_accumulation_steps .. 2048 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] loss_scale ................... 0 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] memory_breakdown ............. False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] optimizer_name ............... None +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] optimizer_params ............. None +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] pld_enabled .................. False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] pld_params ................... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] prescale_gradients ........... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_groups .............. 1 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_offset .............. 1000 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_period .............. 1000 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_rounding ............ 0 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_start_bits .......... 16 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_target_bits ......... 8 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_training_enabled .... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_type ................ 0 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] quantize_verbose ............. False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] scheduler_name ............... None +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] scheduler_params ............. None +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] sparse_attention ............. None +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] steps_per_print .............. 2000 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] tensorboard_enabled .......... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] tensorboard_output_path ...... +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] train_batch_size ............. 2048 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] use_quantizer_kernel ......... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] wall_clock_breakdown ......... False +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] world_size ................... 1 +[2021-10-10 11:10:59,201] [INFO] [config.py:944:print] zero_allow_untested_optimizer False +[2021-10-10 11:10:59,202] [INFO] [config.py:944:print] zero_config .................. { + "stage": 1, + "contiguous_gradients": true, + "reduce_scatter": true, + "reduce_bucket_size": 5.000000e+08, + "allgather_partitions": true, + "allgather_bucket_size": 5.000000e+08, + "overlap_comm": false, + "load_from_fp32_weights": true, + "elastic_checkpoint": true, + "offload_param": null, + "offload_optimizer": null, + "sub_group_size": 1.000000e+09, + "prefetch_bucket_size": 5.000000e+07, + "param_persistence_threshold": 1.000000e+05, + "max_live_parameters": 1.000000e+09, + "max_reuse_distance": 1.000000e+09, + "gather_fp16_weights_on_model_save": false, + "ignore_unused_parameters": true, + "round_robin_gradients": false, + "legacy_stage1": false +} +[2021-10-10 11:10:59,202] [INFO] [config.py:944:print] zero_enabled ................. True +[2021-10-10 11:10:59,202] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 +[2021-10-10 11:10:59,202] [INFO] [config.py:946:print] json = { + "train_micro_batch_size_per_gpu": 1, + "train_batch_size": 2.048000e+03, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": 1 + }, + "fp16": { + "enabled": true, + "loss_scale": 0, + "loss_scale_window": 500, + "hysteresis": 2, + "min_loss_scale": 1, + "initial_scale_power": 12 + }, + "curriculum_learning": { + "enabled": true, + "curriculum_type": "seqlen", + "min_difficulty": 64, + "max_difficulty": 2.048000e+03, + "schedule_type": "fixed_linear", + "schedule_config": { + "total_curriculum_step": 3.600000e+04, + "difficulty_step": 8 + } + }, + "steps_per_print": 2.000000e+03, + "wall_clock_breakdown": false +} +[2021-10-10 11:10:59,202] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,591] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + will not load any checkpoints and will start from random +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,695] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-10 11:10:59,696] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +time (ms) | load-checkpoint: 2.23 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.22432 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.22432 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.22432 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.22432 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.368064estimated model parameters without embeddings: 103.368064 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-10 11:10:59 +> building train, validation, and test datasets ... + > datasets target sizes (minimum size): + train: 600000000 + validation: 3000320 + test: 10240 +> building train, validation, and test datasets for GPT ... + > building dataset index ... + reading sizes... + reading pointers... + reading document index... + creating numpy buffer of mmap... + creating memory view of numpy buffer... + > finished creating indexed dataset in 0.035739 seconds + number of documents: 304230423 + > dataset split: + train: + document indices in [0, 288714672) total of 288714672 documents + validation: + document indices in [288714672, 303926193) total of 15211521 documents + test: + document indices in [303926193, 304230423) total of 304230 documents + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.123 seconds + total number of samples: 657686117 + total number of epochs: 5 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.107 seconds + total number of samples: 6927161 + total number of epochs: 1 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.031 seconds + total number of samples: 137384 + total number of epochs: 1 +> finished creating GPT datasets ... +[after dataloaders are built] datetime: 2021-10-10 11:11:05 +done with setup ... +training ... +Number of parameters: 125.2213504 billion +time (ms) | model-and-optimizer-setup: 4922.44 | train/valid/test-data-iterators-setup: 4880.27 +Number of parameters: 125.2213504 billionNumber of parameters: 125.2213504 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 125.22432 billionNumber of parameters: 125.22432 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters: 125.2213504 billion +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billionNumber of parameters without embeddings: 103.368064 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +[before the start of training step] datetime: 2021-10-10 11:11:05 +[2021-10-10 11:11:05,177] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information +[2021-10-10 11:11:05,178] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False +[2021-10-10 11:11:05,178] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers +[2021-10-10 11:11:05,178] [INFO] [checkpointing.py:554:forward] ----Synchronization False +[2021-10-10 11:11:05,178] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False +Traceback (most recent call last): +Traceback (most recent call last): +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 246, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 165, in pretrain + iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train + iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train + iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train + iteration = train(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 732, in train + train_step(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + train_step(forward_step_func,train_step(forward_step_func, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + train_step(forward_step_func, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/training.py", line 405, in train_step + loss = model[0].train_batch(data_iter=data_iterator) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + loss = model[0].train_batch(data_iter=data_iterator) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + loss = model[0].train_batch(data_iter=data_iterator) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + loss = model[0].train_batch(data_iter=data_iterator) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 329, in train_batch + self._exec_schedule(sched) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + self._exec_schedule(sched) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + self._exec_schedule(sched) +self._exec_schedule(sched) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 1313, in _exec_schedule + self._exec_instr(**cmd.kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + self._exec_instr(**cmd.kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + self._exec_instr(**cmd.kwargs) +self._exec_instr(**cmd.kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/engine.py", line 631, in _exec_forward_pass + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + outputs = super().forward(inputs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/engine.py", line 1321, in forward + loss = self.module(*inputs, **kwargs)loss = self.module(*inputs, **kwargs) + + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + loss = self.module(*inputs, **kwargs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl +loss = self.module(*inputs, **kwargs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + result = self.forward(*input, **kwargs)result = self.forward(*input, **kwargs) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 352, in forward + x = self.activation_checkpoint_func(x = self.activation_checkpoint_func( + + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + x = self.activation_checkpoint_func( + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + x = self.activation_checkpoint_func( + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 743, in checkpoint + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + CheckpointFunction.apply(function, all_outputs, *args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 582, in forward + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func +outputs = run_function(*inputs_cuda) + File "/gpfsssd/worksf/projects/rech/six/commun/code/cutting-edge/deepspeed/deepspeed/runtime/pipe/module.py", line 330, in exec_func + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl +inputs = layer(inputs) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward +result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 588, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + return super().forward(hidden_states, attention_mask, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 479, in forward + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + self.self_attention(layernorm_output, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward + result = self.forward(*input, **kwargs) + attention_probs = self.scale_mask_softmax(attention_scores, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward + + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/transformer.py", line 333, in forward +attention_probs = self.scale_mask_softmax(attention_scores, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + attention_probs = self.scale_mask_softmax(attention_scores, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + attention_probs = self.scale_mask_softmax(attention_scores, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 157, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 157, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 157, in forward + result = self.forward(*input, **kwargs) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/fused_softmax.py", line 157, in forward + mask_output = self.mask_func(input, mask) if mask is not None else input + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/utils.py", line 43, in attention_mask_func + mask_output = self.mask_func(input, mask) if mask is not None else input + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/utils.py", line 43, in attention_mask_func + mask_output = self.mask_func(input, mask) if mask is not None else input + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/utils.py", line 43, in attention_mask_func +mask_output = self.mask_func(input, mask) if mask is not None else input + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/megatron/model/utils.py", line 43, in attention_mask_func + attention_scores.masked_fill_(attention_mask, -10000.0) +RuntimeError attention_scores.masked_fill_(attention_mask, -10000.0): +The expanded size of the tensor (64) must match the existing size (2048) at non-singleton dimension 3. Target sizes: [1, 20, 64, 64]. Tensor sizes: [1, 1, 2048, 2048] +RuntimeError: The expanded size of the tensor (64) must match the existing size (2048) at non-singleton dimension 3. Target sizes: [1, 20, 64, 64]. Tensor sizes: [1, 1, 2048, 2048] + attention_scores.masked_fill_(attention_mask, -10000.0) +RuntimeError : attention_scores.masked_fill_(attention_mask, -10000.0)The expanded size of the tensor (64) must match the existing size (2048) at non-singleton dimension 3. Target sizes: [1, 20, 64, 64]. Tensor sizes: [1, 1, 2048, 2048] + +RuntimeError: The expanded size of the tensor (64) must match the existing size (2048) at non-singleton dimension 3. Target sizes: [1, 20, 64, 64]. Tensor sizes: [1, 1, 2048, 2048] +Killing subprocess 583904 +Killing subprocess 583905 +Killing subprocess 583906 +Killing subprocess 583908 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1504567.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +srun: error: r7i4n5: task 1: Exited with exit code 1 +srun: Terminating job step 1504567.0 +slurmstepd: error: *** STEP 1504567.0 ON r7i4n4 CANCELLED AT 2021-10-10T11:11:11 *** +Killing subprocess 1309400 +Killing subprocess 582570 +Killing subprocess 1309401 +Killing subprocess 1309402 +Killing subprocess 583515 +Killing subprocess 1309403 +Killing subprocess 580155 +Killing subprocess 582571 +Main process received SIGTERM, exiting +Killing subprocess 583516 +Killing subprocess 580156 +Killing subprocess 580157 +Killing subprocess 580158 +Main process received SIGTERM, exiting +Killing subprocess 582572 +Killing subprocess 582699 +Killing subprocess 582713 +Killing subprocess 788242 +Killing subprocess 582573 +Main process received SIGTERM, exiting +Killing subprocess 583517 +Killing subprocess 583519 +Killing subprocess 788243 +Main process received SIGTERM, exiting +Killing subprocess 788244 +Killing subprocess 582700 +Killing subprocess 788245 +Killing subprocess 582701 +Killing subprocess 582703 +Killing subprocess 582714 +Killing subprocess 582715 +Killing subprocess 582716 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 642660 +Killing subprocess 590935 +Killing subprocess 585946 +Killing subprocess 583066 +Killing subprocess 590936 +Killing subprocess 591423 +Killing subprocess 217428 +Killing subprocess 642661 +Killing subprocess 590937 +Killing subprocess 585947 +Killing subprocess 267185 +Killing subprocess 583067 +Killing subprocess 267817 +Killing subprocess 591424 +Killing subprocess 590939 +Killing subprocess 240661 +Killing subprocess 263198 +Killing subprocess 217429 +Killing subprocess 642662 +Killing subprocess 642663 +Killing subprocess 242067 +Killing subprocess 267186 +Killing subprocess 583068 +Killing subprocess 241026 +Killing subprocess 239636 +Killing subprocess 591425 +Killing subprocess 217430 +Killing subprocess 239936 +Killing subprocess 585948 +Killing subprocess 267187 +Killing subprocess 585949 +Killing subprocess 267818 +Killing subprocess 239560 +Main process received SIGTERM, exiting +Killing subprocess 240181 +Killing subprocess 240264 +Killing subprocess 591426 +Killing subprocess 239842 +Killing subprocess 240662 +Killing subprocess 263199 +Killing subprocess 583069 +Killing subprocess 241027 +Killing subprocess 239069 +Killing subprocess 242068 +Killing subprocess 267819 +Killing subprocess 240663 +Main process received SIGTERM, exiting +Killing subprocess 267189 +Killing subprocess 239937 +Killing subprocess 242069 +Main process received SIGTERM, exiting +Killing subprocess 239637 +Killing subprocess 239938 +Killing subprocess 241028 +Killing subprocess 240265 +Killing subprocess 267821 +Killing subprocess 240664 +Killing subprocess 239561 +Killing subprocess 217431 +Killing subprocess 239843 +Main process received SIGTERM, exiting +Killing subprocess 240182 +Killing subprocess 263200 +Killing subprocess 242071 +Killing subprocess 263201 +Killing subprocess 240816 +Killing subprocess 238499 +Main process received SIGTERM, exiting +Killing subprocess 241030 +Killing subprocess 239070 +Killing subprocess 239844 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 239127 +Killing subprocess 239638 +Killing subprocess 239639 +Killing subprocess 240266 +Killing subprocess 239845 +Killing subprocess 591347 +Main process received SIGTERM, exiting +Killing subprocess 240267 +Killing subprocess 240183 +Killing subprocess 239269 +Killing subprocess 239562 +Killing subprocess 240184 +Killing subprocess 238500 +Main process received SIGTERM, exiting +Killing subprocess 239563 +Main process received SIGTERM, exiting +Killing subprocess 239128 +Killing subprocess 239939 +Killing subprocess 239071 +Main process received SIGTERM, exiting +Killing subprocess 239072 +Main process received SIGTERM, exiting +Killing subprocess 239270 +Killing subprocess 240817 +Killing subprocess 238501 +Killing subprocess 591348 +Killing subprocess 239271 +Main process received SIGTERM, exiting +Killing subprocess 240818 +Killing subprocess 240819 +Killing subprocess 238502 +Killing subprocess 239129 +Main process received SIGTERM, exiting +Killing subprocess 239130 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 239272 +Killing subprocess 591349 +Killing subprocess 591350 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +srun: error: r9i3n3: task 19: Exited with exit code 1 +srun: error: r9i3n0: task 16: Exited with exit code 1 +srun: error: r7i4n6: task 2: Exited with exit code 1 +srun: error: r7i5n6: task 11: Exited with exit code 1 +srun: error: r9i5n3: task 29: Exited with exit code 1 +srun: error: r7i5n2: task 7: Exited with exit code 1 +srun: error: r7i6n0: task 14: Exited with exit code 1 +srun: error: r9i4n0: task 25: Exited with exit code 1 +srun: error: r9i3n8: task 24: Exited with exit code 1 +srun: error: r7i5n8: task 13: Exited with exit code 1 +srun: error: r9i3n5: task 21: Exited with exit code 1 +srun: error: r9i3n7: task 23: Exited with exit code 1 +srun: error: r7i4n7: task 3: Exited with exit code 1 +srun: error: r9i3n1: task 17: Exited with exit code 1 +srun: error: r7i4n4: task 0: Exited with exit code 1 +srun: error: r9i3n4: task 20: Exited with exit code 1 +srun: error: r9i3n6: task 22: Exited with exit code 1 +srun: error: r7i5n5: task 10: Exited with exit code 1 +srun: error: r7i5n4: task 9: Exited with exit code 1 +srun: error: r9i2n8: task 15: Exited with exit code 1 +srun: error: r9i5n5: task 31: Exited with exit code 1 +srun: error: r9i5n2: task 28: Exited with exit code 1 +srun: error: r9i4n2: task 27: Exited with exit code 1 +srun: error: r7i5n7: task 12: Exited with exit code 1 +srun: error: r9i5n4: task 30: Exited with exit code 1 +srun: error: r9i4n1: task 26: Exited with exit code 1 +srun: error: r9i3n2: task 18: Exited with exit code 1 +srun: error: r7i5n0: task 5: Exited with exit code 1 +srun: error: r7i4n8: task 4: Exited with exit code 1 +srun: error: r7i5n3: task 8: Exited with exit code 1 +srun: error: r7i5n1: task 6: Exited with exit code 1 +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +JIT compiled ops requires ninja + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] + + +--------------------------------------------------[OKAY]---------------------------------------------------------------------------------------------------- + + + +--------------------------------------------------op nameop name +op name ................op name................ ................ installed................ .. installedcompatibleinstalledinstalled + --------------------------------------------------.... +.. compatible compatible +compatible +-------------------------------------------------- +cpu_adam +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- +............... + [YES] ...... [OKAY]cpu_adam +-------------------------------------------------- +-------------------------------------------------- + ............... cpu_adam[YES]cpu_adam .................................... [OKAY][YES][YES] +fused_adam ......................... [OKAY][OKAY][NO] + +op name op name................ ................installed ..installed compatible.. + --------------------------------------------------compatible +.......fused_adam [OKAY]............. + +-------------------------------------------------- + [NO] .......fused_lamb fused_adamfused_adam [OKAY] ............. +cpu_adam ............... [YES]cpu_adam ...... ...............[OKAY] +[YES] ...... [OKAY] +............. ............. [NO] fused_lamb[NO] [NO] ....... ........................... [OKAY] [OKAY][OKAY] +[NO] + +fused_adam ............. [NO]fused_adam .................... [OKAY][NO] + ....... [OKAY]fused_lamb + ....... [OKAY] +fused_lamb .......................... [NO][NO] ....... .......[OKAY] +[OKAY]sparse_attn +fused_lamb ............. fused_lamb[NO] .................... [OKAY][NO] + ............sparse_attn [NO]............ .......[NO] [OKAY]....... + [OKAY] + ....... [OKAY] +sparse_attn sparse_attn............ [NO]............ .......[NO] [OKAY]....... + [OKAY] +sparse_attntransformertransformersparse_attn ............ ........................ ............ [NO] [NO].......[NO][NO] ....... [OKAY].......[OKAY]....... + +transformertransformer ............ ............[NO] [NO]....... .......[OKAY] +[OKAY] + [OKAY][OKAY] + +stochastic_transformer stochastic_transformer. [NO]. ....... [NO][OKAY] +transformerstochastic_transformertransformer ............stochastic_transformer............ . [NO].[NO] .......[NO][NO]....... ....... [OKAY] +....... [OKAY] +[OKAY].......[OKAY] + +[OKAY]stochastic_transformer + .stochastic_transformer [NO] ........ [NO][OKAY] +....... [OKAY] +ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] +[OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +op name +-------------------------------------------------- op name +................op name ................ op nameinstalled ................ installed ................installed.. .. .. installedcompatible compatiblecompatible +.. + + ----------------------------------------------------------------------------------------------------compatible +-------------------------------------------------- + + +-------------------------------------------------- +cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam............... ...... ............... [YES] ............... [YES] [OKAY]......[YES] + ......[OKAY]...... +[OKAY] +[OKAY] +fused_adam ............. [NO] fused_adamfused_adam....... ............. .............fused_adam [OKAY] [NO] +[NO]............. ....... fused_lamb....... [NO] [OKAY].............[OKAY] +....... + [NO][OKAY]fused_lamb + fused_lamb ....... .............fused_lamb............. [NO][OKAY]............. [NO] +....... .......[NO][OKAY] +[OKAY]....... + [OKAY] +sparse_attn ............ [NO] ....... sparse_attn[OKAY]sparse_attn + ............sparse_attn............ transformer [NO][NO]........................ ..............[NO] [OKAY][NO] + [OKAY].............. + transformer [OKAY] +............transformer[OKAY] ............ +[NO]stochastic_transformer [NO]transformer....... ....... .[OKAY] ............ +[OKAY] [NO] +[NO] .......stochastic_transformer ....... stochastic_transformer[OKAY] . +[OKAY] . + [NO][NO] .............. stochastic_transformer [OKAY] [OKAY] + +. [NO] ....... [OKAY] +ninjaninja .................................... [OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- +op nameop name ................ ................installed ..installed compatible.. + --------------------------------------------------compatible + +-------------------------------------------------- +cpu_adam cpu_adam............... ...............[YES] [YES]...... [OKAY]...... + [OKAY] +fused_adam ............. [NO] fused_adam....... .............[OKAY] +[NO] ....... [OKAY]fused_lamb + ............. [NO]fused_lamb .................... [OKAY] +[NO] ....... [OKAY] +sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] + .......transformer [OKAY]............ + [NO] ....... [OKAY]transformer + ............ [NO] stochastic_transformer....... [OKAY]. + [NO] stochastic_transformer....... [OKAY] +. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja +JIT compiled ops requires ninja +-------------------------------------------------- + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] + +[OKAY] + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop nameop name ................................op name................ installed installed ................ installed.... compatibleinstalled..compatible + + ..----------------------------------------------------------------------------------------------------compatible + + +compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adamcpu_adam .............................. [YES][YES]cpu_adamcpu_adam ...... .................................... [OKAY][OKAY] + +[YES][YES] ............ [OKAY] +[OKAY]fused_adam +fused_adam .......................... [NO][NO] ....... .......[OKAY] fused_adam +[OKAY] fused_adam +............. .............[NO]fused_lamb fused_lamb[NO] .................... ............. [OKAY][NO][NO] +....... ....... ....... [OKAY]fused_lamb +[OKAY] [OKAY] +............. + [NO]fused_lamb .................... [OKAY] +[NO] ....... [OKAY]sparse_attnsparse_attn + ........................ [NO] [NO]....... .......sparse_attn[OKAY] +[OKAY]............ + [NO]transformer sparse_attn transformer....... ............ ........................ [OKAY] [NO] +[NO][NO] ..............transformer [OKAY] +[OKAY]................... + [NO]stochastic_transformer[OKAY] stochastic_transformer +......... [OKAY][NO][NO] + transformer.............. stochastic_transformer [OKAY] [OKAY]............ + + . [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformer . [NO] ....... [OKAY] +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +------------------------------------------------------------------------------------------------------------------------------------------------------ +JIT compiled ops requires ninja + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- +JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja +-------------------------------------------------- + + +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + + +JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... .................. [OKAY] +[OKAY][OKAY][OKAY]-------------------------------------------------- + + + +--------------------------------------------------op name---------------------------------------------------------------------------------------------------- + + +................op nameop name op name installed................ ................................ .. installed installedinstalled compatible .. +.... -------------------------------------------------- compatible +compatible +compatible + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES] ...... [OKAY]cpu_adamcpu_adam + cpu_adam............... ............... [YES] ............... [YES] ......[YES]fused_adam ............[OKAY]............. +[OKAY][OKAY][NO] + +....... [OKAY] +fused_lamb fused_adam............. fused_adam.............[NO]fused_adam .................................[NO] [NO] .......[OKAY] [NO]....... + [OKAY] .......[OKAY] + +[OKAY] +fused_lamb fused_lamb.............fused_lamb sparse_attn ............. [NO]......................... [NO].......[NO] [NO] [OKAY].............. ....... +[OKAY][OKAY] + +[OKAY] +transformer ............ [NO] ....... [OKAY]sparse_attn + sparse_attn............ stochastic_transformer[NO] .............sparse_attn....... [NO] [NO][OKAY] ............ + .......[NO]....... transformer .......[OKAY][OKAY] + +............[OKAY] +transformer[NO] transformer................... ............[NO] [OKAY] [NO] +....... .......[OKAY] stochastic_transformer +[OKAY] +stochastic_transformer .stochastic_transformer .[NO] [NO]........ .......[OKAY][NO] + [OKAY]....... + [OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop name op nameop name ................ ................ ................installed................ installedinstalledinstalled.. ...... compatible compatible +compatiblecompatible +-------------------------------------------------- + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ............... ..................... [YES]...............[OKAY][YES] + ......[YES]...... [OKAY]......[OKAY] + + [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_adamfused_adamfused_adam fused_lamb............. ............. ............. [NO] .............[NO] [NO]....... ....... [NO] .......[OKAY][OKAY] + +[OKAY]....... + [OKAY]fused_lambfused_lamb + fused_lamb .......................... .............[NO][NO] .......[NO]....... [OKAY]....... +sparse_attn [OKAY] [OKAY] +............ + [NO] ....... [OKAY] +transformer ............sparse_attn [NO]............sparse_attnsparse_attn [NO]................... ............ ....... [NO][OKAY] [NO] +[OKAY] ....... +....... [OKAY][OKAY]transformer + + stochastic_transformer............ transformer transformer [NO] . ........................ ....... [NO][NO] [NO] [OKAY]....... ....... + ....... [OKAY] [OKAY] +[OKAY] + +stochastic_transformer stochastic_transformerstochastic_transformer. .[NO]. .......[NO][NO] [OKAY].............. + [OKAY][OKAY] + +ninjaninjaninjaninja .................. .................. [OKAY]....................................[OKAY] + + --------------------------------------------------[OKAY]-------------------------------------------------- +[OKAY] + +op name +-------------------------------------------------- op name--------------------------------------------------................ + +................installedop name op name installed.................................. .. compatibleinstalledinstalled + compatible --------------------------------------------------.. +.. + -------------------------------------------------- compatible +ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY] +compatible + +---------------------------------------------------------------------------------------------------- +cpu_adam + ............... [YES]cpu_adam ..................... [OKAY][YES]cpu_adamcpu_adam + .................................... [OKAY][YES][YES] +[OKAY] +-------------------------------------------------- + + +-------------------------------------------------- +--------------------------------------------------op name-------------------------------------------------- + +op name................op name op nameinstalled................................ ................installed.. installed installed ..compatible.. +.. --------------------------------------------------compatiblecompatiblecompatible + + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + ............ fused_adam[OKAY] [OKAY] +............. + [NO] fused_adam....... .............[OKAY] +[NO] ....... fused_adamfused_lamb[OKAY]fused_adam +cpu_adam cpu_adamcpu_adam...............cpu_adam .............................................[YES] ...... [YES][YES][OKAY][YES] + ............. .............[NO].............fused_lamb [NO] .......[NO] ............. ..............[OKAY][NO] +[OKAY].......[OKAY] + +[OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + + .................. [OKAY][OKAY][OKAY] + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY]fused_adam +fused_lambfused_lamb .......................... [NO][NO] sparse_attn.............. sparse_attn............[OKAY][OKAY] + +op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ...... .. compatiblecompatible compatible + + +compatible---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- + +-------------------------------------------------- + .......................................fused_lamb [NO][NO]............. [NO] ..............[NO] .............. [OKAY][OKAY] +[NO]............ .......[NO] [OKAY]....... + [OKAY] + [OKAY] +[OKAY] + +fused_lambfused_lamb .............fused_lamb............. [NO].............[NO] .......[NO]....... sparse_attn [OKAY].......[OKAY]............ + +transformer ............transformer sparse_attn [NO]sparse_attn ............ ................... ............[NO] [NO][OKAY] .......[NO] + ....... [OKAY] +.......[OKAY] stochastic_transformer +cpu_adamcpu_adam cpu_adam ...............cpu_adam .............................. ...............[YES][YES][YES] [YES]............ ...... [OKAY]...... +stochastic_transformer[OKAY] transformer +[OKAY][OKAY] + +[OKAY] + [NO][OKAY] +....... [OKAY] +. . transformer............ [NO] [NO] ............ .......[NO] .......[NO][OKAY] +fused_adam ............. fused_adam[NO] .............fused_adamfused_adam....... [OKAY]..........................[NO] +transformer ............sparse_attn sparse_attn[NO] ............ .......sparse_attn[NO]............ [OKAY] ............ +.......[NO] [NO][OKAY].......stochastic_transformer +.......[OKAY]....... + [OKAY][OKAY] + + [NO][NO]....... fused_lamb .............. [OKAY][OKAY][OKAY] +............. + + .......[OKAY] +[OKAY].transformer +stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY] +[OKAY] + [NO] fused_lamb.......fused_lamb fused_lamb [OKAY]....................................... + transformer[NO]............transformer ...................[NO]............ [NO] [OKAY] ....... + [NO][NO][NO] ..................... [OKAY] [OKAY] +[OKAY] + +[NO] ....... [OKAY] ....... +[OKAY] +[OKAY] +sparse_attn ............ [NO] ....... [OKAY] +stochastic_transformerstochastic_transformer stochastic_transformer. . [NO].[NO] .......[NO]....... [OKAY] +.......[OKAY] +[OKAY] +transformersparse_attnsparse_attnsparse_attn ............ ............ ........................ [NO] [NO][NO] [NO] ....... .............. ....... [OKAY] [OKAY][OKAY] + +[OKAY] + +transformertransformerstochastic_transformer transformer ............ ............ .............[NO][NO] [NO].............. [NO] [OKAY].......[OKAY] + + .......[OKAY] +stochastic_transformer[OKAY] stochastic_transformer + . .[NO] stochastic_transformer....... [NO][OKAY]. +....... [NO][OKAY] +....... [OKAY] +ninjaninjaninjaninja ...................................................... [OKAY] .................. +[OKAY][OKAY]-------------------------------------------------- + + +[OKAY]op name---------------------------------------------------------------------------------------------------- + + +................--------------------------------------------------op name op name installed + ................ ..................op nameinstalled compatible +..installed................ -------------------------------------------------- compatible +..installed + --------------------------------------------------compatible.. + + compatible-------------------------------------------------- + +cpu_adam-------------------------------------------------- +............... [YES]cpu_adam .....................cpu_adam [OKAY] [YES] +cpu_adam ..................... ...............[YES][OKAY] +......[YES] [OKAY]......fused_adam + .............[OKAY] [NO] + .......fused_adam [OKAY]............. + [NO]fused_adam ....... fused_lamb .............[OKAY]fused_adam............. + [NO][NO]............. fused_lamb ....... .......[NO]............. [OKAY][OKAY][NO]....... + + .......[OKAY] [OKAY] +fused_lamb + ............. fused_lamb[NO] .................... sparse_attn[NO][OKAY] +................... sparse_attn[NO][OKAY] ............ +....... [NO][OKAY] +....... [OKAY] +transformer ............sparse_attn transformer [NO] ............sparse_attn ............ ....... [NO][NO] ............ [OKAY]....... +....... [NO] [OKAY] [OKAY]stochastic_transformer +....... + [OKAY].transformer +stochastic_transformer [NO]transformer............. ...................[NO] [NO] [OKAY].............. +[NO] [OKAY][OKAY] ....... + + [OKAY] +stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report + + + +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] + +[OKAY] +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op nameop name + op name................ ................op name ................ installed installed................ installed .. .... installed compatiblecompatible +..compatible + ---------------------------------------------------------------------------------------------------- +compatible + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam...............cpu_adam ...............[YES].............................. [YES][YES] ...... [YES] [OKAY]............ + ......[OKAY] +[OKAY][OKAY] + +fused_adam ............. [NO]fused_adamfused_adam fused_adam ....... ....................................... [NO] [OKAY][NO] [NO] +....... .......[OKAY].......fused_lamb + [OKAY][OKAY] +............. +fused_lamb [NO]............. fused_lamb .......[NO] fused_lamb ............. [OKAY] ....... + .............[NO][OKAY] +[NO]....... [OKAY]....... + [OKAY] +sparse_attn ............ [NO] sparse_attn....... ............[OKAY] +[NO]sparse_attn transformersparse_attn................... ........................[NO][OKAY] [NO] + .......[NO]....... transformer [OKAY].......[OKAY]............ + + [OKAY][NO]transformer + stochastic_transformer................... transformer [OKAY] .[NO] +............ [NO].......stochastic_transformer[NO] ....... ....... [OKAY].[OKAY][OKAY] + + +[NO] ....... stochastic_transformer[OKAY]stochastic_transformer + . .[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja +-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................. .................. [OKAY].................. [OKAY] + [OKAY] +[OKAY]-------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op name-------------------------------------------------- +op name + op name................................op name ................installedinstalled................ installed.. .. installed ..compatiblecompatible + +..compatible---------------------------------------------------------------------------------------------------- + + +compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam.............................. cpu_adam [YES][YES]............... ...... ............... ...... [OKAY][YES][OKAY][YES] + + ............ [OKAY][OKAY] + +fused_adam fused_adam............. .............[NO] fused_adamfused_adam [NO] .................... .................... [OKAY][NO] +[OKAY][NO] + fused_lamb.............. [OKAY][OKAY]fused_lamb............. + + fused_lamb fused_lamb.............[NO]............. [NO] ............. [NO]....... ....... [NO] ....... [OKAY][OKAY]....... + + [OKAY][OKAY] + +sparse_attnsparse_attn ............ sparse_attn sparse_attn............[NO] ............ ............[NO] ....... [NO] ....... [NO].......[OKAY][OKAY] ....... + +[OKAY] [OKAY]transformer + + transformer............ transformertransformer ............ [NO] ........................ [NO] ....... [NO][NO]....... [OKAY] .......[OKAY] +....... + [OKAY][OKAY] + +stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer. . [NO]..[NO] .......[NO][NO] ....... [OKAY] .............. +[OKAY] + [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam .............................. [YES] [YES]...... ......[OKAY] +[OKAY] +fused_adam fused_adam............. .............[NO] [NO]....... .......[OKAY] +[OKAY] +fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] +[OKAY] +sparse_attnsparse_attn ........................ [NO][NO] ....... .......[OKAY] +[OKAY] +transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja +-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] + +[OKAY] +-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +op nameop name + op name op name................ ................................ installed................installedinstalled .... ..installed compatible compatiblecompatible + +.. +-------------------------------------------------- -------------------------------------------------- +--------------------------------------------------compatible + + +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam.............................. cpu_adam ............... [YES][YES] ............... [YES] ......[YES]...... [OKAY]......[OKAY]...... + +[OKAY] +[OKAY] +fused_adamfused_adam .............fused_adamfused_adam............. [NO].............[NO]............. [NO]..............[NO] .......[OKAY][OKAY]....... + + [OKAY][OKAY] + +fused_lamb fused_lambfused_lambfused_lamb ............. ............. .......................... [NO] [NO] [NO][NO] ....... ....... .............. [OKAY] [OKAY] +[OKAY][OKAY] + + +sparse_attnsparse_attnsparse_attnsparse_attn ........................ ........................ [NO][NO] [NO] [NO] ....... .............. ....... [OKAY][OKAY] + +[OKAY][OKAY] +transformer +transformer transformer............transformer............ ............[NO] ............ [NO] [NO] .......[NO] ....... ....... ....... [OKAY][OKAY][OKAY] + +[OKAY] + +stochastic_transformerstochastic_transformer stochastic_transformerstochastic_transformer .. .[NO].[NO] [NO]..............[NO] ..............[OKAY][OKAY] + +[OKAY][OKAY] + +ninjaninjaninja ninja.................. .................. ..................[OKAY][OKAY].................. + + [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- + + + +op name--------------------------------------------------op name-------------------------------------------------- + + ................op nameop name ................ installed ................ ................ installed..installedinstalled ....compatible.. compatible + +compatiblecompatible---------------------------------------------------------------------------------------------------- + + + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adamcpu_adamcpu_adam ............... ............... ............... [YES]............... [YES] [YES] [YES]............ ......[OKAY] [OKAY] +[OKAY]...... + + [OKAY] +fused_adam .............fused_adamfused_adam fused_adam.............[NO]............. .............[NO][NO]....... [NO].......[OKAY] ....... +[OKAY]....... + [OKAY] +[OKAY]fused_lamb +fused_lamb fused_lamb............. fused_lamb[NO].......................... ....................[NO][NO] [OKAY][NO]....... +....... [OKAY].......[OKAY] + +[OKAY] +sparse_attn ............ [NO]sparse_attn .......sparse_attn............ sparse_attn[OKAY] ........................[NO] + transformer[NO][NO]....... ..........................[OKAY] [OKAY] +[NO][OKAY] + .......transformer +transformer [OKAY]............transformer............ + [NO] [NO] ............ stochastic_transformer....... ....... [NO][OKAY].[OKAY] + + [NO]....... .......stochastic_transformerstochastic_transformer[OKAY] [OKAY] + +. .[NO] stochastic_transformer [NO] ....... ....... .[OKAY][OKAY] + +[NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +-------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- +JIT compiled ops requires ninja + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] +[OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +op nameop name + op name................ op name................ ................ installed................installedinstalled ..installed.. .. compatible compatible +..compatible +-------------------------------------------------- + +-------------------------------------------------- -------------------------------------------------- +compatible + +-------------------------------------------------- +cpu_adam ............... [YES]cpu_adam cpu_adam.....................cpu_adam [YES][OKAY] ............... + ............... ...... [YES] [OKAY][YES]...... + [OKAY]......fused_adam + .............[OKAY] +[NO] .......fused_adam [OKAY]............. + [NO]fused_adam ....................fused_lamb fused_adam[OKAY].............[NO] +....................[NO] fused_lamb.......[OKAY][NO] + .............[OKAY]....... +[NO] fused_lamb [OKAY]....... + .............[OKAY] +[NO]fused_lamb .......sparse_attn............. [OKAY]............ + [NO][NO] ..............sparse_attn [OKAY][OKAY]............ + + sparse_attn[NO] transformer ............ ....... ............ [NO] [OKAY] [NO] +....... .......[OKAY] transformer +[OKAY]sparse_attn +transformer............ ........................[NO]stochastic_transformer [NO][NO]....... . .......[OKAY] [NO] + [OKAY].............. +stochastic_transformer [OKAY][OKAY]. stochastic_transformer + +[NO] ........ [OKAY]transformer[NO] + ................... [OKAY][NO] + ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + + + +--------------------------------------------------JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninja +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + + +op nameop nameop nameop name ................................................................ installed installedinstalled installed .. .. .... compatible compatiblecompatible +compatible + +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + +cpu_adamcpu_adam cpu_adam ............... ...............cpu_adam ...............[YES] [YES][YES]............... ...... ...... ......[YES] [OKAY] [OKAY] +[OKAY] +...... + [OKAY] +fused_adamfused_adamfused_adam .......................................fused_adam [NO][NO][NO] ............. .............. ....... [NO] [OKAY][OKAY][OKAY] + +....... + [OKAY]fused_lamb +fused_lamb .............fused_lamb............. [NO]fused_lamb.............[NO] ....................[NO]....... .......[OKAY][NO][OKAY] + +[OKAY]....... + [OKAY] +sparse_attnsparse_attn sparse_attn ........................ ............[NO]sparse_attn[NO] ....... [NO]....... ............ [OKAY] .......[OKAY] +[NO] +[OKAY] +.......transformer transformer transformer............[OKAY] + ............[NO]............ [NO].......[NO]transformer [OKAY].......................... + [OKAY][OKAY][NO] + +stochastic_transformer ....... [OKAY].stochastic_transformerstochastic_transformer + [NO] .........stochastic_transformer [NO][NO][OKAY] + ............... [OKAY][NO][OKAY] + +....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + +op name +op name op name ................op name ................ ................installedinstalled................ ..installed.. installed compatible....compatible + +--------------------------------------------------compatible-------------------------------------------------- +compatible + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adam cpu_adam............... cpu_adam...............cpu_adam[YES] [YES].................................... ......[OKAY][YES] [YES][OKAY] + +............ [OKAY][OKAY] + +fused_adamfused_adam .......................... fused_adam [NO][NO]fused_adam............. .................... ....... [NO][NO] [OKAY][OKAY] +....... +....... fused_lamb[OKAY][OKAY] +fused_lamb............. + .............fused_lamb[NO] fused_lamb [NO]............. ........................... [OKAY] [NO][NO] + [OKAY] ....... +....... [OKAY][OKAY] + +sparse_attn ............ [NO] sparse_attn....... sparse_attn............sparse_attn[OKAY] + [NO]........................ transformer ....... [NO][OKAY][NO]............ + .......[NO]....... transformer [OKAY] .......[OKAY] +............ + [OKAY]transformer[NO] +transformer ............................... stochastic_transformer [NO][NO][OKAY] + ............... [OKAY][NO][OKAY]stochastic_transformer + ....... + [OKAY] +stochastic_transformer. [NO]stochastic_transformer . ....... .[NO][OKAY] + [NO]....... .......[OKAY] +[OKAY] +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + + +--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatibleninja + --------------------------------------------------.................. +[OKAY] +-------------------------------------------------- +op name ................ installed ..cpu_adam compatible +...............-------------------------------------------------- +[YES] ...... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +fused_adam fused_adam............. ............. [NO][NO] .............. [OKAY] +[OKAY] +fused_lamb ............. [NO] fused_lamb....... [OKAY]............. + [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +sparse_attntransformer ........................ [NO] [NO]....... .......[OKAY] + [OKAY] +stochastic_transformertransformer ............. [NO][NO] ....... [OKAY]....... + [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY] + +[OKAY] +-------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------op name + + + op nameop name................ op name installed................................ ................ .. installed installedinstalled compatible.. .. + .. --------------------------------------------------compatible +compatiblecompatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES] ...... cpu_adam[OKAY]cpu_adam +cpu_adam ............... ............... ............... [YES] [YES] [YES] ...... fused_adam...... ...... [OKAY].............[OKAY] + +[OKAY] +[NO] ....... [OKAY] +fused_adamfused_lamb ..........................fused_adam fused_adam [NO][NO] ............. .................... ....... [NO][OKAY][NO] + [OKAY].............. + [OKAY][OKAY] + +fused_lamb ............. [NO]fused_lambsparse_attnfused_lamb ............................................. [OKAY][NO] [NO] + [NO] ..................... [OKAY][OKAY][OKAY] + + +transformer sparse_attn............ [NO]............ .......[NO] sparse_attnsparse_attn[OKAY] +....... ........................ stochastic_transformer [OKAY][NO] [NO] + ............... transformer [OKAY][NO][OKAY]............ + +....... transformer [NO] transformer[OKAY] +............................... [NO][OKAY][NO] +.............. [OKAY][OKAY]stochastic_transformer + + . [NO]stochastic_transformer stochastic_transformer ....... .[OKAY]. +[NO] [NO]....... .......[OKAY] + [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + meet the required dependencies to JIT install the op. + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +--------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report + + + +DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +JIT compiled ops requires ninja----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- + + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja +-------------------------------------------------- + +-------------------------------------------------- +JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY][OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- + +op nameop nameop name op name................ ................ ................................ installed installed installedinstalled .. .. .... compatible compatible +compatiblecompatible + +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] +[OKAY] + + +---------------------------------------------------------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +cpu_adam ...............cpu_adamcpu_adamcpu_adam [YES]............................................. ...... [YES][YES] [YES] [OKAY] .................. + [OKAY][OKAY][OKAY] + + +[OKAY] + +---------------------------------------------------------------------------------------------------- + +----------------------------------------------------------------------------------------------------op nameop name + + ................op nameop name................ installed installed................ ................ ....installed installed compatible compatible +op nameop nameop name op name................................................ ................installedinstalledinstalled ..installed.. .. compatiblecompatible.. + + --------------------------------------------------compatible--------------------------------------------------compatible + + + +---------------------------------------------------------------------------------------------------- + +fused_adam ............. [NO]fused_adamfused_adam fused_adam ....... ............. .......................... [OKAY] [NO] +.... + --------------------------------------------------compatible--------------------------------------------------compatible + + + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam .............................. [YES]cpu_adam[YES] cpu_adam........................... ...............[OKAY][YES][OKAY] + +[NO][NO] ..................... [OKAY][OKAY][OKAY]fused_lamb + + + ............. [NO]fused_lamb fused_lambfused_lamb ....... ............. .......................... [OKAY] [NO] +cpu_adam ............... [YES]cpu_adam cpu_adam ......cpu_adam............... [YES]...............[OKAY] ............... ...... +[YES]...... ......[OKAY] +[OKAY] +[NO][NO] ..................... [OKAY][OKAY][OKAY] + + + [YES][YES][OKAY] +............ [OKAY][OKAY] +fused_adam fused_adam............. .............[NO] [NO]....... fused_adam fused_adam....... [OKAY] ............. +sparse_attn ............ [NO] .......sparse_attn sparse_attn [OKAY] ............sparse_attn + +.............[OKAY] +............ [NO]............[NO]transformer [NO].......................... [NO].......[OKAY][OKAY] + +fused_adam ............. [NO] ....... [OKAY] +fused_adam ............. fused_adamfused_adam[NO]fused_lamb ................................. .............[OKAY] + [NO][NO]fused_lamb ...........................fused_lamb [NO][OKAY].............[OKAY] + ....... +.......[OKAY] +[OKAY]transformertransformer + ........................ transformer[NO][NO] stochastic_transformer ............ .............. .[NO][OKAY][OKAY] + [NO][NO][NO] fused_lamb..................... [OKAY] .............[OKAY] +[OKAY] + +[NO] [OKAY]....... +fused_lamb fused_lamb[OKAY] + +[NO]....... .......[OKAY] stochastic_transformer +[OKAY] +[NO] fused_lamb....... fused_lamb[OKAY]............. + .............[NO] [NO]....... .......[OKAY] sparse_attn +.......................... [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformer . stochastic_transformer[NO] . .......[NO]. [OKAY].......[NO] +[OKAY] +............ [NO] .......sparse_attn [OKAY]............ +sparse_attn ............ [NO] ....... sparse_attn[OKAY] +............ [NO] .......transformer sparse_attn [OKAY]sparse_attn............ + [OKAY]....... + [OKAY] + transformer[NO] ...................sparse_attn [NO][OKAY]sparse_attn ............ + ............[NO]............ transformer .......[NO] [NO] ............ [OKAY][NO]....... +............ ....... [NO]transformer [NO]...................[OKAY] +.......[NO][OKAY] +[OKAY]stochastic_transformer....... +....... .......[OKAY][OKAY]stochastic_transformer + +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] +[OKAY] + + +---------------------------------------------------------------------------------------------------- +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] + +[OKAY] + transformer.transformer[OKAY] + [OKAY] +-------------------------------------------------- +-------------------------------------------------- +op nameop name + op name................op name................ ................installedinstalled................ installed .. ..installed .. compatible compatible.. +compatible -------------------------------------------------- + +compatible +---------------------------------------------------------------------------------------------------- + + +-------------------------------------------------- +-------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------op name + + +ninjaninjaninja ninja.................. .................. .................. ..................[OKAY][OKAY][OKAY] + +[OKAY] +[NO]........................ .......[NO]stochastic_transformer [NO] [OKAY] ............... +.transformer transformer [NO]stochastic_transformer ......................... ....... [NO][NO][NO][OKAY] +cpu_adamcpu_adam ...............cpu_adam...............cpu_adam [YES].............................. [YES] ...... [YES] [YES] ......[OKAY] ...... +...... [OKAY] [OKAY] +[OKAY] + +................op nameop nameop name ................installed................................ ..installedinstalledinstalled compatible.... .. +compatible --------------------------------------------------compatible +compatible + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op nameop nameop name + [OKAY][NO] +[OKAY] + ..................... [OKAY][OKAY] +[OKAY] + +-------------------------------------------------- + +-------------------------------------------------- +-------------------------------------------------- + ................ op name ................................installed installedinstalled.................. ..compatibleinstalled.. + compatible--------------------------------------------------compatible.. + + + --------------------------------------------------compatible-------------------------------------------------- + + +....... [OKAY]stochastic_transformer + stochastic_transformer . .[NO] [NO]....... .......[OKAY] +stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] +[OKAY] +fused_adam fused_adam............. fused_adam.............[NO]fused_adam ....................[NO]............. [OKAY][NO] +cpu_adam ...............cpu_adam [YES]............... cpu_adam cpu_adam[YES] ..................... ..................... [OKAY] [YES] +-------------------------------------------------- +[OKAY] +[NO]....... .......fused_lamb.......[OKAY] +[OKAY].............[OKAY] + +[YES][OKAY] +............ [OKAY][OKAY] + +cpu_adam cpu_adam...............cpu_adam cpu_adam...............[YES]............... ...............[YES] ...... [YES] ......[YES] [OKAY] ...... +[OKAY]...... +[NO]fused_lamb fused_lamb.................... fused_lamb .............[OKAY] [NO][NO] +fused_adam ............. [NO] fused_adam....... .............[OKAY] + [OKAY][OKAY] + +............. ..............[NO] [OKAY] [OKAY] +....... + [OKAY] +[NO]fused_adamfused_adam fused_lamb.................... ............. .............[OKAY][NO] [NO] + [NO].............. [OKAY]....... +fused_adam .............fused_adam fused_adamfused_adam[NO]............. ............. [NO].............[NO]....... .......[NO][OKAY]....... +sparse_attn ............ [NO] ....... [OKAY] +fused_lamb[OKAY] +[OKAY]............. + [OKAY].......[OKAY] + +fused_lamb[OKAY] +sparse_attnsparse_attn sparse_attntransformer ............ ............ ........................ [NO][NO] [NO][NO] ....... ..................... [OKAY] [OKAY][OKAY] +[OKAY] + [NO] .......fused_lamb fused_lamb[OKAY] sparse_attn +.............fused_lambfused_lamb [NO]fused_lamb ............. .................... ............. [NO] [NO] [OKAY][NO] ....... +....... [OKAY].......[OKAY] + +[OKAY] + + +............. .........................[NO] [NO][NO]....... ..............sparse_attn[OKAY] + [OKAY]............[OKAY] + +transformertransformertransformer ........................stochastic_transformer............ [NO][NO][NO] . .............. ....... [OKAY] + [NO][OKAY][OKAY] + +....... [OKAY]stochastic_transformerstochastic_transformer +[NO]transformer ................... [OKAY][NO] +sparse_attn ............ sparse_attn[NO]sparse_attnsparse_attn ............................... ............[NO][OKAY] + stochastic_transformer ... [NO] [NO] [NO].............. .......[OKAY] +[OKAY][OKAY] + + .......sparse_attn transformer[OKAY] ............ +[NO][NO]....... .......transformer[OKAY]....... +............ [OKAY] [OKAY] +............ sparse_attn[NO][NO] stochastic_transformer .......................... . [NO] [OKAY][OKAY] [NO] + +[NO]transformer + transformer ....... ............ transformer............ [OKAY] [NO] +....... .......stochastic_transformertransformer [OKAY] [OKAY] +............. +[NO]............ ..............[NO] [OKAY][OKAY]stochastic_transformer....... + transformer [NO][NO]............ ..............[NO] [OKAY][OKAY]....... + + + [OKAY]. + [OKAY] + [NO] stochastic_transformerstochastic_transformer.......stochastic_transformer [OKAY]. +. . [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] +stochastic_transformer .stochastic_transformer [NO] ........ [OKAY] +[NO] ....... [OKAY] +[OKAY] + +ninjaninja .................................... [OKAY] +[OKAY] +-------------------------------------------------- +--------------------------------------------------op name + ................ op nameinstalled .................. compatibleinstalled + --------------------------------------------------.. + compatible +-------------------------------------------------- +cpu_adam ............... [YES]cpu_adam ...... ...............[OKAY] +[YES] ...... [OKAY] +fused_adam ............. [NO] fused_adam....... [OKAY]............. + [NO] ....... [OKAY] +fused_lamb ............. fused_lamb[NO] .................... [OKAY] +[NO] ....... [OKAY] +sparse_attn ............ sparse_attn[NO] ................... [OKAY][NO] + ....... [OKAY]transformer + ............ [NO]transformer ................... [OKAY] +[NO] ....... [OKAY]stochastic_transformer + . stochastic_transformer[NO] ....... .[OKAY] + [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +ninja .................. [OKAY] +fused_adam-------------------------------------------------- +............. op name[NO] ....................... installed [OKAY].. + compatible +--------------------------------------------------fused_lamb + ............. [NO] ....... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +fused_adam transformer............. ............[NO] [NO]....... .......[OKAY] +[OKAY] +fused_lamb stochastic_transformer............. [NO] ........ [NO][OKAY] +....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + + + +--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +op name + op nameop name................ op name ................................ installed installed................ installed ..installed .. .. compatible compatible +..compatible + +--------------------------------------------------compatible---------------------------------------------------------------------------------------------------- + + + +-------------------------------------------------- +cpu_adam cpu_adam...............cpu_adam cpu_adam ............... [YES]............... ............... [YES] ......[YES] ......[YES]......[OKAY] +[OKAY]......[OKAY] + +[OKAY] +fused_adamfused_adam fused_adam.......................... fused_adam.............[NO] [NO][NO].................... .......[NO]....... [OKAY] +[OKAY].......[OKAY] + + fused_lamb[OKAY] fused_lamb +.............fused_lamb fused_lamb[NO].......................... ............. [NO][NO] ....... .......[NO].......[OKAY] .......[OKAY] + +[OKAY][OKAY] + +sparse_attn ............sparse_attnsparse_attnsparse_attn ............[NO]........................ [NO].......[NO] [NO] [OKAY] ..................... [OKAY] + [OKAY] +[OKAY] + +transformertransformertransformer ............ transformer............ ............ [NO]............[NO][NO] ....... [NO].............. [OKAY][OKAY] [OKAY] + +....... + [OKAY] +stochastic_transformerstochastic_transformerstochastic_transformer stochastic_transformer.. . [NO].[NO][NO] ....... [NO] .............. [OKAY] .......[OKAY] +[OKAY] +[OKAY] + +ninjaninjaninjaninja .................. ...................................................... [OKAY] [OKAY][OKAY] +[OKAY] + +-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- +op name +-------------------------------------------------- op name +................ op name ................op name installed................installed................ .. .. installedinstalled compatible.. + compatible-------------------------------------------------- .. + + compatible--------------------------------------------------compatible + + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... [YES]cpu_adam ..................... [OKAY][YES]cpu_adam cpu_adam + ...... ...............[OKAY]............... + [YES][YES] ............ [OKAY]fused_adam[OKAY] + +............. [NO] .......fused_adam [OKAY]............. + [NO] .......fused_adamfused_lambfused_adam [OKAY].......................... +............. [NO][NO][NO] fused_lamb....... ....... ....................[OKAY] +[NO][OKAY][OKAY] + +.......fused_lamb [OKAY]............. +fused_lamb [NO]............. ....... [OKAY] +sparse_attn [NO]............ [NO].......sparse_attn ................... [OKAY] [OKAY]sparse_attn[NO] + + .......transformer............ [OKAY]............[NO] + [NO] .......sparse_attn.......transformer [OKAY][OKAY]............ +............ + stochastic_transformer[NO]transformer [NO] ........................... [OKAY] + [NO][NO] stochastic_transformer.......[OKAY] +.......[OKAY]. + [OKAY]transformer[NO] + ................... stochastic_transformer [OKAY] +[NO] ........ [NO][OKAY] +....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja +JIT compiled ops requires ninja +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] +[OKAY][OKAY] +-------------------------------------------------- + + +----------------------------------------------------------------------------------------------------op name + +-------------------------------------------------- op nameop name +................ ................ ................op nameinstalled installed................installed .. .... compatiblecompatiblecompatibleinstalled + + + --------------------------------------------------..---------------------------------------------------------------------------------------------------- + + +compatible +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam .............................. [YES][YES] ............... ...... cpu_adam ......[YES] [OKAY] +......[OKAY] ............... +[OKAY] +[YES] ...... [OKAY] +fused_adam ............. [NO]fused_adam .......fused_adam............. [OKAY] +[NO]............. .......[NO]fused_lamb [OKAY].............fused_adam....... + [NO] [OKAY].......fused_lamb............. [OKAY] + +[NO]............. [NO]fused_lamb ........................... [NO][OKAY][OKAY] + +.......sparse_attn [OKAY]............ + fused_lamb[NO] .................... [NO][OKAY] +sparse_attnsparse_attn....... transformer ............ ........................[OKAY] [NO][NO][NO] + ..................... [OKAY][OKAY][OKAY] + + +transformertransformer ............stochastic_transformer ............ [NO]sparse_attn.[NO] ...................[NO]....... .......[OKAY][NO] [OKAY][OKAY] + + +.......stochastic_transformerstochastic_transformer [OKAY]. + .[NO] transformer[NO]....... ................... [OKAY] [NO] +[OKAY] +....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + + +op nameop nameop name op name................ ................ ................installed................installed installedinstalled.. .. ..compatible..compatible + +compatible-------------------------------------------------- +compatible-------------------------------------------------- + +-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ............... cpu_adamcpu_adam[YES]cpu_adam ................................................... [YES] [OKAY][YES][YES]...... ...... + ......[OKAY][OKAY] + +[OKAY] +fused_adam ............. fused_adam[NO] .......fused_adamfused_adam .............[OKAY] +..........................[NO] fused_lamb[NO][NO]....... ....... .................... [OKAY] [OKAY] +[NO] +[OKAY] +.......fused_lamb fused_lamb [OKAY].............fused_lamb ............. + [NO] ............. [NO] ....... [NO] ....... [OKAY]....... + [OKAY][OKAY] +sparse_attn + ............ [NO] ....... [OKAY] +sparse_attntransformer sparse_attn ............sparse_attn ............ ........................[NO][NO] [NO] .......[NO].............. [OKAY]....... +[OKAY][OKAY][OKAY] + + +stochastic_transformer transformertransformertransformer ............. ............ ............[NO][NO][NO] ....... [NO]....... .......[OKAY] [OKAY] +[OKAY]....... + + [OKAY] +stochastic_transformer stochastic_transformer .stochastic_transformer .[NO] .[NO] ....... [NO] .......[OKAY]....... + [OKAY][OKAY] + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +---------------------------------------------------------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +op name +op nameop name op name ................ ................................................installed .. installed installedinstalled compatible ...... + --------------------------------------------------compatiblecompatible +compatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam[OKAY]............... +............... ............... [YES][YES][YES] .................. fused_adam[OKAY] [OKAY] +.............[OKAY] + +[NO] ....... [OKAY] +fused_lambfused_adam .............fused_adam.............fused_adam [NO] [NO] .......................... ....... .......[NO] [NO] [OKAY] .......[OKAY] + +....... [OKAY]fused_lamb [OKAY] +............. + [NO] .......fused_lambfused_lamb sparse_attn [OKAY].......................... +............[NO] [NO][NO]....... .......[OKAY]....... + [OKAY][OKAY] +sparse_attn + transformer............ ............ [NO][NO] .......sparse_attn....... sparse_attn............[OKAY][OKAY] + +[NO]............ stochastic_transformertransformer....... [NO] . [OKAY]....... [NO]............ +[OKAY] +.......transformer[NO] transformer[OKAY] ............ +................... [NO][OKAY][NO] + ....... .......[OKAY]stochastic_transformer + [OKAY] +. stochastic_transformer[NO] stochastic_transformer ....... . . [OKAY] [NO] +.......[NO] [OKAY]....... + [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io + ............... [NO] ....... [NO] +async_io ............... [NO] ....... transformer_inference[NO] +.. [NO] ....... [OKAY] +transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] + +quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] +[OKAY] +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] async_io....... [NO]............... + [NO] ....... [NO] +transformer_inference .. [NO] ....... transformer_inference[OKAY] +.. [NO] ....... utils[OKAY] +.................. [YES] ...... [OKAY] +utils .................. [YES] quantizer...... ..............[OKAY] +[NO] ....... quantizer[OKAY] +.............. [NO] ....... --------------------------------------------------[OKAY] + +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils utils.................. ..................[YES] [YES]...... ......[OKAY] +[OKAY] +quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io + ............... [NO] ....... [NO] +async_io ............... transformer_inference[NO] ......... [NO][NO] +....... [OKAY] +utils .................. [YES] ...... [OKAY]transformer_inference + .. [NO] quantizer....... ..............[OKAY] +[NO] ....... [OKAY] +utils ..................-------------------------------------------------- +[YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils  [WARNING]  async_io: please install the libaio-devel package with yum.................. [YES] ...... +[OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] [WARNING]  async_io: please install the libaio-devel package with yum +-------------------------------------------------- + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ......DeepSpeed general environment info: torch 1.8, cuda 11.1 + +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + + +async_ioasync_ioasync_io ............................................. [NO][NO][NO] ....... ....... ....... [NO] [NO] +[NO] + +transformer_inferencetransformer_inference .... [NO]transformer_inference[NO] ................ [OKAY][NO][OKAY] + +....... [OKAY] +utilsutils .................................... utils [YES] [YES] .................. ...... ...... [YES] [OKAY][OKAY] + +...... [OKAY] +quantizerquantizer ..............quantizer.............. [NO]..............[NO] ..............[NO] [OKAY][OKAY]....... + + [OKAY] +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] async_io....... [NO] +............... [NO] ....... [NO] +transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] + ....... [OKAY] +utils .................. utils[YES] ........................ [YES][OKAY] +...... [OKAY] +quantizer .............. [NO]quantizer ..................... [OKAY][NO] + ....... [OKAY]-------------------------------------------------- + +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO]async_io +utils .................. [YES] ...... [OKAY] + ............... [NO] ....... [NO] +quantizer .............. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +-------------------------------------------------- +transformer_inference .. [NO]utils ......................... [OKAY][YES] + ...... [OKAY] +utils .................. quantizer[YES] .................... [NO] [OKAY]....... + [OKAY] +quantizer ..............-------------------------------------------------- +[NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] +[OKAY] +utils ..................utils [YES].................. ......[YES] [OKAY]...... + [OKAY] +quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version DeepSpeed general environment info:..................... 11.2 + +deepspeed install path ........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +...............deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']...... + torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... DeepSpeed general environment info:[OKAY] + +-------------------------------------------------- +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils utils.................. ..................[YES] [YES]...... ......[OKAY] +[OKAY] +quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +-------------------------------------------------- +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO]async_io ...................... [NO][NO] + ....... [NO] +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] + +quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + set_global_variables(extra_args_provider=extra_args_provider, + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + _GLOBAL_TOKENIZER = build_tokenizer(args) +torch version .................... 1.8.1 + self.encoder = json.load(open(vocab_file)) +torch cuda version ............... 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +async_io ............... [NO] ....... [NO] +torch version .................... 1.8.1 +transformer_inference .. [NO] ....... [OKAY] +torch cuda version ............... 11.1 +utils .................. [YES] ...... [OKAY] +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + [WARNING]  async_io: please install the libaio-devel package with yum +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +/bin/sh: line 0: type: git: not found + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +_ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file)tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace',self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path + ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +.................... 1.8.1torch version + ....................torch cuda version 1.8.1............... + 11.1torch cuda version + nvcc version............... .....................11.1 +11.2nvcc version + deepspeed install path..................... ...........11.2 +deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']........... + deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +0.5.5+cd7967d, cd7967d, master +deepspeed info deepspeed wheel compiled w.................... ......0.5.5+cd7967d, cd7967d, master +torch 1.8, cuda 11.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +self.encoder = json.load(open(vocab_file)) +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +DeepSpeed general environment info: +async_io ............... [NO] ....... [NO] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +transformer_inference .. [NO] ....... [OKAY] +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +utils .................. [YES] ...... [OKAY] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +quantizer .............. [NO] ....... [OKAY] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +/bin/sh: line 0: type: git: not found + initialize_megatron(extra_args_provider=extra_args_provider, +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version ............... 11.1 +nvcc version .................................... 11.111.2 + +nvcc versiondeepspeed install path ................................ 11.2 +deepspeed install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +........... deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... + deepspeed infotorch 1.8, cuda 11.1 +................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +DeepSpeed general environment info: +/bin/sh: line 0: type: git: not found +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +async_io ............... [NO] ....... [NO] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +utils .................. [YES] ...... [OKAY] +quantizer ..............async_io [NO] ...................... [NO][OKAY] + ....... [NO] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO]utils ......................... [OKAY][YES] + ...... [OKAY] +utils .................. quantizer[YES] .................... [NO][OKAY] +....... [OKAY] +quantizer .............. [NO]-------------------------------------------------- +....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + initialize_megatron(extra_args_provider=extra_args_provider, + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils ..................utils [YES].................. ......[YES] [OKAY]...... + [OKAY] +quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... + [OKAY] +---------------------------------------------------------------------------------------------------- + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + set_global_variables(extra_args_provider=extra_args_provider, +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + _ = _build_tokenizer(args) +/bin/sh: line 0: type: git: not found + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master + _GLOBAL_TOKENIZER = build_tokenizer(args) +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']torch version + .................... torch version1.8.1 +.................... 1.8.1torch cuda version + ............... torch cuda version11.1 +...............nvcc version 11.1..................... + 11.2nvcc version + deepspeed install path..................... ...........11.2 +deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']........... + deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +0.5.5+cd7967d, cd7967d, master +deepspeed info deepspeed wheel compiled w.................... ......0.5.5+cd7967d, cd7967d, master +torch 1.8, cuda 11.1deepspeed wheel compiled w. + ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +/bin/sh: line 0: type: git: not found + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum + self.encoder = json.load(open(vocab_file)) + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +Traceback (most recent call last): +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install pathDeepSpeed general environment info:DeepSpeed general environment info: ............... + +torch install pathtorch install path ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'].............................. + torch version .................... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']1.8.1 + + +torch versiontorch cuda versiontorch version ....................................................... 11.11.8.11.8.1 + + +nvcc version torch cuda version.....................torch cuda version ...............11.2............... + 11.111.1deepspeed install path + + nvcc versionnvcc version........... .......................................... 11.211.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + + +deepspeed install pathdeepspeed install pathdeepspeed info ......................................... 0.5.5+cd7967d, cd7967d, master +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed wheel compiled w.['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed info......deepspeed info ...................torch 1.8, cuda 11.1................... + 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider,initialize_megatron(extra_args_provider=extra_args_provider, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +set_global_variables(extra_args_provider=extra_args_provider, + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + [WARNING]  async_io: please install the libaio-devel package with yum + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +DeepSpeed general environment info: + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + self.encoder = json.load(open(vocab_file)) +-------------------------------------------------- +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + _GLOBAL_TOKENIZER = build_tokenizer(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file)tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.encoder = json.load(open(vocab_file))self.encoder = json.load(open(vocab_file)) + +FileNotFoundErrorFileNotFoundError : self.encoder = json.load(open(vocab_file))[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json': + +[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master + self.encoder = json.load(open(vocab_file)) +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found............... + [NO] ....... [OKAY] +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + _GLOBAL_TOKENIZER = build_tokenizer(args) +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +Traceback (most recent call last): +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + [WARNING]  async_io: please install the libaio-devel package with yum + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +/bin/sh: line 0: type: git: not found +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found....... +[OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +quantizer .............. [NO] ....... [OKAY] +async_io-------------------------------------------------- +............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + _ = _build_tokenizer(args) +nvcc version ..................... 11.2 + set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + _GLOBAL_TOKENIZER = build_tokenizer(args) +_ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +_GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +DeepSpeed general environment info: + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +torch cuda version ............... 11.1 + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io: please install the libaio-devel package with yum +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +torch version .................... 1.8.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, +initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider,set_global_variables(extra_args_provider=extra_args_provider, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args)_ = _build_tokenizer(args) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) +_GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file)tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace',self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +initialize_megatron(extra_args_provider=extra_args_provider, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + self.encoder = json.load(open(vocab_file)) +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + [WARNING]  async_io: please install the libaio-devel package with yum + _ = _build_tokenizer(args) + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + _ = _build_tokenizer(args) +async_io ............... [NO] ....... [NO] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +transformer_inference .. [NO] ....... [OKAY] + _GLOBAL_TOKENIZER = build_tokenizer(args) +utils .................. [YES] ...... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +DeepSpeed general environment info: +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +DeepSpeed general environment info: + _ = _build_tokenizer(args) +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +Traceback (most recent call last): + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +/bin/sh: line 0: type: git: not found + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, +initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + set_global_variables(extra_args_provider=extra_args_provider, +transformer_inference .. [NO] ....... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +utils .................. [YES] ...... [OKAY] + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +_ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + pretrain(train_valid_test_datasets_provider, model_provider, forward_step,pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) + set_global_variables(extra_args_provider=extra_args_provider, +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + self.encoder = json.load(open(vocab_file)) +async_io ............... [NO] ....... [NO] +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + self.encoder = json.load(open(vocab_file)) +self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundError +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +/bin/sh: line 0: type: git: not found + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, +set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider, + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + _ = _build_tokenizer(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + self.encoder = json.load(open(vocab_file)) + _GLOBAL_TOKENIZER = build_tokenizer(args) + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: FileNotFoundError[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + [WARNING]  async_io: please install the libaio-devel package with yum + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +async_io ............... [NO] ....... [NO] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +async_io ............... [NO] ....... [NO] + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO]utils ......................... [OKAY][YES] + ...... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] + + _GLOBAL_TOKENIZER = build_tokenizer(args) +--------------------------------------------------quantizer + self.encoder = json.load(open(vocab_file)) + .............. [NO] ....... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +-------------------------------------------------- +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +/bin/sh: line 0: type: git: not found + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _GLOBAL_TOKENIZER = build_tokenizer(args) +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + self.encoder = json.load(open(vocab_file)) +utils .................. [YES] ...... [OKAY] +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + self.encoder = json.load(open(vocab_file)) +FileNotFoundError self.encoder = json.load(open(vocab_file)) +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io: please install the libaio-devel package with yum +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + initialize_megatron(extra_args_provider=extra_args_provider, +transformer_inference .. [NO] ....... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + set_global_variables(extra_args_provider=extra_args_provider, +-------------------------------------------------- + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): +DeepSpeed general environment info: + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + set_global_variables(extra_args_provider=extra_args_provider, + _ = _build_tokenizer(args) +set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +using torch.float16 for parameters ... +------------------------ arguments ------------------------ + _ = _build_tokenizer(args)initialize_megatron(extra_args_provider=extra_args_provider, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +Traceback (most recent call last): + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file)_ = _build_tokenizer(args) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + accumulate_allreduce_grads_in_fp32 .............. False + adam_beta1 ...................................... 0.9 + adam_beta2 ...................................... 0.95 + adam_eps ........................................ 1e-08 + adlr_autoresume ................................. False + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + adlr_autoresume_interval ........................ 1000 + apply_query_key_layer_scaling ................... True + apply_residual_connection_post_layernorm ........ False + attention_dropout ............................... 0.1 + attention_softmax_in_fp32 ....................... False + bert_binary_head ................................ True + bert_load ....................................... None + _GLOBAL_TOKENIZER = build_tokenizer(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + bf16 ............................................ False + bias_dropout_fusion ............................. True + bias_gelu_fusion ................................ True + biencoder_projection_dim ........................ 0 + biencoder_shared_query_context_model ............ False + block_data_path ................................. None + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + checkpoint_activations .......................... True + checkpoint_in_cpu ............................... False + checkpoint_num_layers ........................... 1 + clip_grad ....................................... 1.0 + codecarbon_dir .................................. None + consumed_train_samples .......................... 0 + consumed_train_tokens ........................... 0 + consumed_valid_samples .......................... 0 + contigious_checkpointing ........................ False + cpu_optimizer ................................... False + cpu_torch_adam .................................. False + curriculum_learning ............................. False + data_impl ....................................... mmap + data_parallel_size .............................. 1 + data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] + dataloader_type ................................. single + DDP_impl ........................................ local + decoder_seq_length .............................. None + deepscale ....................................... False + deepscale_config ................................ None + deepspeed ....................................... True + deepspeed_activation_checkpointing .............. True + deepspeed_config ................................ ./ds_config.1513102.json + deepspeed_mpi ................................... False + distribute_checkpointed_activations ............. False + distributed_backend ............................. nccl + embedding_path .................................. None + encoder_seq_length .............................. 2048 + eod_mask_loss ................................... False + eval_interval ................................... 1000 + eval_iters ...................................... 5 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + evidence_data_path .............................. None + exit_duration_in_mins ........................... 1190 + exit_interval ................................... None + ffn_hidden_size ................................. 46400 + finetune ........................................ False + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + fp16 ............................................ True + fp16_lm_cross_entropy ........................... False + fp32_residual_connection ........................ False + gigaflos_no_embeds .............................. 0 + global_batch_size ............................... 2048 + glu_activation .................................. None + hidden_dropout .................................. 0.1 + hidden_size ..................................... 11600 + hysteresis ...................................... 2 + ict_head_size ................................... None + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + ict_load ........................................ None + img_dim ......................................... 224 + indexer_batch_size .............................. 128 + indexer_log_interval ............................ 1000 + init_method_std ................................. 0.02 + init_method_xavier_uniform ...................... False + self.encoder = json.load(open(vocab_file)) + initial_loss_scale .............................. 4294967296 + kv_channels ..................................... 145 + layernorm_epsilon ............................... 1e-05 + lazy_mpu_init ................................... None + load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + local_rank ...................................... 0 +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'self.encoder = json.load(open(vocab_file)) + + log_batch_size_to_tensorboard ................... True + log_interval .................................... 1 + log_learning_rate_to_tensorboard ................ True + log_loss_scale_to_tensorboard ................... True + log_num_zeros_in_grad ........................... False + log_params_norm ................................. False + log_timers_to_tensorboard ....................... True +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + log_validation_ppl_to_tensorboard ............... True + loss_on_targets_only ............................ False + loss_scale ...................................... 12.0 + loss_scale_window ............................... 1000 + lr .............................................. 6e-05 + lr_decay_iters .................................. None + lr_decay_samples ................................ None + lr_decay_style .................................. cosine + lr_decay_tokens ................................. 260000000000 + lr_warmup_fraction .............................. None + lr_warmup_iters ................................. 0 + lr_warmup_samples ............................... 216320 + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + make_vocab_size_divisible_by .................... 128 + mask_prob ....................................... 0.15 + masked_softmax_fusion ........................... False + max_position_embeddings ......................... 2048 + memory_centric_tiled_linear ..................... False + merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt + micro_batch_size ................................ 1 + min_loss_scale .................................. 1.0 + min_lr .......................................... 6e-06 + mmap_warmup ..................................... False + no_load_optim ................................... None + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + no_load_rng ..................................... None + no_save_optim ................................... None + no_save_rng ..................................... None + num_attention_heads ............................. 80 + num_channels .................................... 3 + num_classes ..................................... 1000 + num_layers ...................................... 64 + num_layers_per_virtual_pipeline_stage ........... None + num_workers ..................................... 2 + _ = _build_tokenizer(args) + onnx_safe ....................................... None + openai_gelu ..................................... False + optimizer ....................................... adam + override_lr_scheduler ........................... False + params_dtype .................................... torch.float16 + partition_activations ........................... False + patch_dim ....................................... 16 + pipeline_model_parallel_size .................... 32 + position_embedding_type ......................... PositionEmbeddingType.absolute + profile_backward ................................ False + query_in_block_prob ............................. 0.1 + rampup_batch_size ............................... None + rank ............................................ 0 + remote_device ................................... none + reset_attention_mask ............................ False + reset_position_ids .............................. False + _ = _build_tokenizer(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + + retriever_report_topk_accuracies ................ [] + retriever_score_scaling ......................... False + retriever_seq_length ............................ 256 + sample_rate ..................................... 1.0 + save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + save_interval ................................... 300 + scatter_gather_tensors_in_pipeline .............. True + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + self.encoder = json.load(open(vocab_file)) + scattered_embeddings ............................ False + seed ............................................ 43 + seq_length ...................................... 2048 + sgd_momentum .................................... 0.9 + short_seq_prob .................................. 0.1 + split ........................................... 949,50,1 + split_transformers .............................. False + synchronize_each_layer .......................... False + tensor_model_parallel_size ...................... 4 + tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + tensorboard_log_interval ........................ 1 + tensorboard_queue_size .......................... 5 + tile_factor ..................................... 1 + titles_data_path ................................ None + tokenizer_name_or_path .......................... None + tokenizer_type .................................. GPT2BPETokenizer +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + train_iters ..................................... None + train_samples ................................... 600000000 + train_tokens .................................... 300000000000 + use_checkpoint_lr_scheduler ..................... False + use_contiguous_buffers_in_ddp ................... False + use_cpu_initialization .......................... None + use_one_sent_docs ............................... False + use_pin_memory .................................. False + _GLOBAL_TOKENIZER = build_tokenizer(args)_GLOBAL_TOKENIZER = build_tokenizer(args) + virtual_pipeline_model_parallel_size ............ None + vocab_extra_ids ................................. 0 + vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json + weight_decay .................................... 0.1 + world_size ...................................... 128 + zero_allgather_bucket_size ...................... 0.0 + zero_contigious_gradients ....................... False + zero_reduce_bucket_size ......................... 0.0 + zero_reduce_scatter ............................. False + zero_stage ...................................... 1 +-------------------- end of arguments --------------------- + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +setting number of micro-batches to constant 2048 +> building GPT2BPETokenizer tokenizer ... + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found + self.encoder = json.load(open(vocab_file)) +FileNotFoundError : _GLOBAL_TOKENIZER = build_tokenizer(args)[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, +DeepSpeed general environment info: + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, +nvcc version ..................... 11.2 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master + _ = _build_tokenizer(args) +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.encoder = json.load(open(vocab_file))self.encoder = json.load(open(vocab_file)) + +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +/bin/sh: line 0: type: git: not found +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +/bin/sh: line 0: type: git: not found + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +/bin/sh: line 0: type: git: not found + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']torch version + .................... 1.8.1torch version + ....................torch cuda version ...............1.8.1 +11.1 +nvcc versiontorch cuda version ..................... ...............11.2 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + deepspeed install path11.1 ........... + nvcc version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']..................... + deepspeed info11.2 +................... deepspeed install path0.5.5+cd7967d, cd7967d, master + deepspeed wheel compiled w............ ...... torch 1.8, cuda 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + set_global_variables(extra_args_provider=extra_args_provider, + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + _ = _build_tokenizer(args) +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +-------------------------------------------------- + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +/bin/sh: line 0: type: git: not found + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + initialize_megatron(extra_args_provider=extra_args_provider, + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +Traceback (most recent call last): +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + initialize_megatron(extra_args_provider=extra_args_provider, + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace',self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundError self.encoder = json.load(open(vocab_file)) +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundErrorFileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _ = _build_tokenizer(args) +/bin/sh: line 0: type: git: not found + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +DeepSpeed general environment info: + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 + _GLOBAL_TOKENIZER = build_tokenizer(args) +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, +DeepSpeed general environment info: + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: + _ = _build_tokenizer(args) +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args)initialize_megatron(extra_args_provider=extra_args_provider, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _ = _build_tokenizer(args) + self.encoder = json.load(open(vocab_file)) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + _GLOBAL_TOKENIZER = build_tokenizer(args) + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + self.encoder = json.load(open(vocab_file)) +DeepSpeed general environment info: + _GLOBAL_TOKENIZER = build_tokenizer(args) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed info + deepspeed info................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, masterdeepspeed wheel compiled w. + deepspeed wheel compiled w....... torch 1.8, cuda 11.1...... + torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +/bin/sh: line 0: type: git: not found + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +/bin/sh: line 0: type: git: not found + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +Traceback (most recent call last): + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundErrorFileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, +Traceback (most recent call last): + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _ = _build_tokenizer(args) + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _ = _build_tokenizer(args) + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + _GLOBAL_TOKENIZER = build_tokenizer(args) + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +DeepSpeed general environment info: + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +torch cuda version ............... 11.1 + _GLOBAL_TOKENIZER = build_tokenizer(args) + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +nvcc version ..................... 11.2 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master + _GLOBAL_TOKENIZER = build_tokenizer(args) +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + self.encoder = json.load(open(vocab_file)) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) +DeepSpeed general environment info: + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + self.encoder = json.load(open(vocab_file)) +torch cuda version ............... 11.1 +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + _GLOBAL_TOKENIZER = build_tokenizer(args) +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +**** Git info for Megatron: git_hash=unknown git_branch=unknown ******** Git info for Megatron: git_hash=unknown git_branch=unknown **** + + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + initialize_megatron(extra_args_provider=extra_args_provider,initialize_megatron(extra_args_provider=extra_args_provider, + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +/bin/sh: line 0: type: git: not found + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +utils .................. [YES] ...... [OKAY] + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + [WARNING]  async_io: please install the libaio-devel package with yum + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + _GLOBAL_TOKENIZER = build_tokenizer(args) +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +/bin/sh: line 0: type: git: not found + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +DeepSpeed general environment info: + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, +torch install path ............... DeepSpeed general environment info: +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install path torch version............... .................... 1.8.1 + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +torch cuda version['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +............... 11.1 +torch version nvcc version.................... .....................1.8.1 +11.2 +deepspeed install pathtorch cuda version .......................... 11.1 +nvcc version['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +.....................deepspeed info 11.2................... + deepspeed install path0.5.5+cd7967d, cd7967d, master +...........deepspeed wheel compiled w. ...... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']torch 1.8, cuda 11.1 + +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + _ = _build_tokenizer(args) + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _GLOBAL_TOKENIZER = build_tokenizer(args) + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +/bin/sh: line 0: type: git: not found + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +/bin/sh: line 0: type: git: not found +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + _GLOBAL_TOKENIZER = build_tokenizer(args) + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + self.encoder = json.load(open(vocab_file)) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + _GLOBAL_TOKENIZER = build_tokenizer(args) + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.encoder = json.load(open(vocab_file))self.encoder = json.load(open(vocab_file)) + +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ +/bin/sh: line 0: type: git: not found + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + self.encoder = json.load(open(vocab_file)) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + set_global_variables(extra_args_provider=extra_args_provider, + set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _ = _build_tokenizer(args) + _ = _build_tokenizer(args) File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + _GLOBAL_TOKENIZER = build_tokenizer(args)_GLOBAL_TOKENIZER = build_tokenizer(args) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file))self.encoder = json.load(open(vocab_file)) + +FileNotFoundErrorFileNotFoundError: : [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json'[Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +/bin/sh: line 0: type: git: not found + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + set_global_variables(extra_args_provider=extra_args_provider, File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args)_ = _build_tokenizer(args) + + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) + self.encoder = json.load(open(vocab_file)) +FileNotFoundErrorFileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in +/bin/sh: line 0: type: git: not found + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', +initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Killing subprocess 192984 +Killing subprocess 192985 +Killing subprocess 192986 +Killing subprocess 192988 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +/bin/sh: line 0: type: git: not found +**** Git info for Megatron: git_hash=unknown git_branch=unknown **** +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py", line 237, in + pretrain(train_valid_test_datasets_provider, model_provider, forward_step, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/training.py", line 97, in pretrain + initialize_megatron(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/initialize.py", line 53, in initialize_megatron + set_global_variables(extra_args_provider=extra_args_provider, + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 93, in set_global_variables + _ = _build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/global_vars.py", line 125, in _build_tokenizer + _GLOBAL_TOKENIZER = build_tokenizer(args) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 43, in build_tokenizer + tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file) + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/tokenizer.py", line 274, in __init__ + self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace', + File "/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/tokenizer/gpt2_tokenization.py", line 164, in __init__ + self.encoder = json.load(open(vocab_file)) +FileNotFoundError: [Errno 2] No such file or directory: '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json' +Killing subprocess 2363481 +Killing subprocess 2363482 +Killing subprocess 2363483 +Killing subprocess 2363484 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 684099 +Killing subprocess 684100 +Killing subprocess 684101 +Killing subprocess 684102 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Killing subprocess 183506 +Killing subprocess 183507 +Killing subprocess 183508 +Killing subprocess 183509 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 183548 +Killing subprocess 183549 +Killing subprocess 183550 +Killing subprocess 183551 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main +Killing subprocess 185348 + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code +Killing subprocess 185349 +Killing subprocess 185350 +Killing subprocess 185351 +Traceback (most recent call last): + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 179523 +Killing subprocess 179524 +Killing subprocess 179525 +Killing subprocess 179526 +Traceback (most recent call last): + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main +Killing subprocess 182881 +Killing subprocess 182882 +Killing subprocess 182883 +Killing subprocess 182884 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code +Killing subprocess 183047 +Killing subprocess 183048 +Killing subprocess 183049 +Killing subprocess 183050 + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + return _run_code(code, main_globals, None, + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr main() +', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 178625 +Killing subprocess 178626 +Killing subprocess 178627 +Killing subprocess 178628 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler +Killing subprocess 332558 +Killing subprocess 332559 +Killing subprocess 332560 +Killing subprocess 332561 +Traceback (most recent call last): + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +Killing subprocess 207830 + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main +Killing subprocess 207831 +Killing subprocess 207832 +Killing subprocess 207833 +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +Killing subprocess 368968 +Killing subprocess 368969 +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lrKilling subprocess 368970 +Killing subprocess 368971 +', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Killing subprocess 195740 +Killing subprocess 195741 +Killing subprocess 195742 +Killing subprocess 195743 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + sigkill_handler(signal.SIGTERM, None) # not coming back + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 229259 +Killing subprocess 229260 +Killing subprocess 229261 +Killing subprocess 229262 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 229024 +Killing subprocess 229025 +Killing subprocess 229026 +Killing subprocess 229027 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Killing subprocess 295501 +Killing subprocess 295502 +Killing subprocess 295503 +Killing subprocess 295504 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 473231 +Killing subprocess 473232 +Killing subprocess 473233 +Killing subprocess 473234 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 185789 +Killing subprocess 185790 +Killing subprocess 185791 +Killing subprocess 185792 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main +Killing subprocess 306062 +Killing subprocess 306063 +Killing subprocess 306064 +Killing subprocess 306065 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Killing subprocess 807367 +Killing subprocess 807368 +Killing subprocess 807369 +Killing subprocess 807370 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Killing subprocess 804449 +Killing subprocess 804450 +Killing subprocess 804451 +Killing subprocess 804452 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 2641588 +Killing subprocess 2641589 +Killing subprocess 2641590 +Killing subprocess 2641591 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 189620 +Killing subprocess 189621 +Killing subprocess 189622 +Killing subprocess 189623 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 189056 +Killing subprocess 189057 +Killing subprocess 189058 +Killing subprocess 189060 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main +Killing subprocess 2205655 +Killing subprocess 2205656 +Killing subprocess 2205657 +Killing subprocess 2205659 + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 208133 +Killing subprocess 208134 +Killing subprocess 208135 +Killing subprocess 208137 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +Killing subprocess 183145 +Killing subprocess 183146 +Killing subprocess 183147 +Killing subprocess 183149 +Killing subprocess 1539973 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main +Killing subprocess 1539974 +Killing subprocess 1539975 +Killing subprocess 1539977 + main() +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler +Killing subprocess 2694736 +Killing subprocess 2694737 +Killing subprocess 2694738 +Killing subprocess 2694740 + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code +Traceback (most recent call last): + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +Killing subprocess 227976 +Killing subprocess 227977 +Killing subprocess 227978 +Killing subprocess 227980 +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main +Killing subprocess 1819600 +Killing subprocess 1819601 +Killing subprocess 1819602 +Killing subprocess 1819604 + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler +Traceback (most recent call last): + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 194, in _run_module_as_main + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. + return _run_code(code, main_globals, None, + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/runpy.py", line 87, in _run_code + exec(code, run_globals) + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in + main() + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main + sigkill_handler(signal.SIGTERM, None) # not coming back + File "/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler + raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) +subprocess.CalledProcessError: Command '['/gpfswork/rech/six/commun/conda/cutting-edge/bin/python', '-u', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/pretrain_gpt.py', '--local_rank=3', '--tensor-model-parallel-size', '4', '--pipeline-model-parallel-size', '32', '--num-layers', '64', '--hidden-size', '11600', '--num-attention-heads', '80', '--seq-length', '2048', '--max-position-embeddings', '2048', '--micro-batch-size', '1', '--global-batch-size', '2048', '--train-samples', '600_000_000', '--train-tokens', '300_000_000_000', '--vocab-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json', '--merge-file', '/gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt', '--loss-scale', '12', '--clip-grad', '1.0', '--fp16', '--checkpoint-activations', '--no-masked-softmax-fusion', '--seed', '43', '--optimizer', 'adam', '--adam-beta1', '0.9', '--adam-beta2', '0.95', '--adam-eps', '1e-8', '--lr', '6e-5', '--min-lr', '6e-6', '--lr-warmup-samples', '216_320', '--lr-decay-tokens', '260000000000', '--lr-decay-style', 'cosine', '--clip-grad', '1.0', '--weight-decay', '1e-1', '--exit-duration-in-mins', '1190', '--log-interval', '1', '--save-interval', '300', '--eval-interval', '1000', '--eval-iters', '5', '--tensorboard-dir', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard', '--tensorboard-queue-size', '5', '--log-timers-to-tensorboard', '--log-batch-size-to-tensorboard', '--log-validation-ppl-to-tensorboard', '--save', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--load', '/gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints', '--data-path', '/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document', '--data-impl', 'mmap', '--split', '949,50,1', '--distributed-backend', 'nccl', '--deepspeed', '--deepspeed_config', './ds_config.1513102.json', '--zero-stage', '1', '--deepspeed-activation-checkpointing']' returned non-zero exit status 1. +srun: error: r8i3n4: task 16: Exited with exit code 1 +srun: Terminating job step 1513102.0 +srun: error: r8i4n3: task 24: Exited with exit code 1 +srun: error: r6i3n0: task 0: Exited with exit code 1 +srun: error: r7i7n1: task 9: Exited with exit code 1 +srun: error: r8i4n6: task 27: Exited with exit code 1 +srun: error: r8i3n2: task 14: Exited with exit code 1 +srun: error: r8i4n2: task 23: Exited with exit code 1 +srun: error: r8i3n5: task 17: Exited with exit code 1 +srun: error: r8i4n5: task 26: Exited with exit code 1 +srun: error: r7i6n4: task 3: Exited with exit code 1 +srun: error: r8i3n1: task 13: Exited with exit code 1 +srun: error: r8i3n8: task 20: Exited with exit code 1 +srun: error: r8i5n0: task 30: Exited with exit code 1 +srun: error: r7i6n8: task 7: Exited with exit code 1 +srun: error: r7i6n7: task 6: Exited with exit code 1 +srun: error: r7i6n5: task 4: Exited with exit code 1 +srun: error: r6i3n1: task 1: Exited with exit code 1 +srun: error: r6i3n2: task 2: Exited with exit code 1 +srun: error: r8i3n7: task 19: Exited with exit code 1 +srun: error: r7i6n6: task 5: Exited with exit code 1 +srun: error: r8i4n1: task 22: Exited with exit code 1 +srun: error: r8i4n0: task 21: Exited with exit code 1 +srun: error: r8i2n8: task 11: Exited with exit code 1 +srun: error: r8i5n1: task 31: Exited with exit code 1 +srun: error: r8i4n7: task 28: Exited with exit code 1 +srun: error: r8i3n0: task 12: Exited with exit code 1 +srun: error: r8i3n3: task 15: Exited with exit code 1 +srun: error: r8i4n4: task 25: Exited with exit code 1 +srun: error: r7i7n0: task 8: Exited with exit code 1 +srun: error: r8i3n6: task 18: Terminated +srun: error: r8i4n8: task 29: Terminated +srun: error: r8i2n7: task 10: Exited with exit code 1 +srun: Force Terminated job step 1513102.0 +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- + + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +------------------------------------------------------------------------------------------------------------------------------------------------------ + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninja + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- + + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +JIT compiled ops requires ninja +JIT compiled ops requires ninja +-------------------------------------------------- + +JIT compiled ops requires ninja + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +------------------------------------------------------------------------------------------------------------------------------------------------------ + +--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja + + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------JIT compiled ops requires ninja +-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja + + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + + +JIT compiled ops requires ninja------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +---------------------------------------------------------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +-------------------------------------------------- +JIT compiled ops requires ninja + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report +--------------------------------------------------JIT compiled ops requires ninja + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja + +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report + + +JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] +[OKAY][OKAY] +-------------------------------------------------- + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +ninjaninjaninjaninja .................. .................. .................................... [OKAY] +ninjaninjaninja ninja...................................................... [OKAY]..................[OKAY][OKAY] + + +op name + + --------------------------------------------------[OKAY]---------------------------------------------------------------------------------------------------- + + + +op nameop nameop name-------------------------------------------------- +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] + +[OKAY] + +op name op nameop name................................ ................................installedinstalled installed..installed.. ..compatiblecompatible.. + + --------------------------------------------------compatible-------------------------------------------------- + +compatible +[OKAY][OKAY][OKAY]-------------------------------------------------- + + + +op name-------------------------------------------------- +--------------------------------------------------................--------------------------------------------------op name + +installed................op name op name ..installed ................ ..................compatible installed + compatible -------------------------------------------------- +................................ ................op nameinstalled installed................installed .. installed..compatible.. + ..compatible--------------------------------------------------compatible +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +-------------------------------------------------- + +-------------------------------------------------- +installed--------------------------------------------------.. + + ..compatible +compatible-------------------------------------------------- + +-------------------------------------------------- + +-------------------------------------------------- +compatible + +---------------------------------------------------------------------------------------------------- + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +op nameop nameop name op name ................................ ................ ................ installed installedinstalledinstalled .. .... .. compatible compatiblecompatible + +compatible +---------------------------------------------------------------------------------------------------- +cpu_adamcpu_adam cpu_adam ...............cpu_adam ............... ............... [YES] [YES]............... [YES] ...... ......[YES] ...... [OKAY][OKAY]...... + + [OKAY][OKAY] + +cpu_adamcpu_adam .............................. [YES][YES] cpu_adam ...... ......cpu_adam ...............[OKAY] + [OKAY][YES]............... +cpu_adam cpu_adam............... ...............[YES] cpu_adam[YES]......cpu_adam .................................... [OKAY] [OKAY] +[YES][YES] + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + + meet the required dependencies to JIT install the op.-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- + + + +--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at +-------------------------------------------------- + +-------------------------------------------------- + + ......[YES] [OKAY]...... + [OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + + ............ [OKAY] +[OKAY]fused_adam +op nameop nameop name ................op name ................ ................ ................installed installed installed installed.. .. .. ..compatible compatible +compatible +compatible +-------------------------------------------------- +-------------------------------------------------- + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja +cpu_adamcpu_adamcpu_adamcpu_adam ............................................. ............... [YES] [YES][YES][YES] ...... ...... ......[OKAY]...... [OKAY] + [OKAY] +[OKAY] +fused_adamfused_adam fused_adamfused_adam ............. ............. ..........................[NO] [NO][NO].......[NO] ..............[OKAY]....... [OKAY][OKAY] + +[OKAY] + +fused_adam ............. [NO]fused_adam .................... [OKAY][NO] +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop nameop nameop name ................ ................................ ................ installedinstalled installed..installed.. compatible....compatible + + .............fused_adam [NO]............. .......[NO] [OKAY]....... + [OKAY]fused_adamfused_adam +-------------------------------------------------- +-------------------------------------------------- + + +fused_lambfused_lambfused_lambfused_lamb ....................................... ............. [NO] [NO][NO] [NO] ............................ [OKAY][OKAY][OKAY][OKAY] + + + + fused_adam....... fused_adamfused_lamb[OKAY]............. + --------------------------------------------------compatiblecompatible-------------------------------------------------- + + + +-------------------------------------------------- +-------------------------------------------------- + fused_lamb.............fused_lamb .......................... [NO].............[NO][NO] .......[NO] .............. ....... [OKAY][OKAY] +[OKAY] +[OKAY] + +cpu_adam cpu_adam............... cpu_adam ...............cpu_adam[YES] [YES]..................... ............... ......[YES] [OKAY] [YES] + ...... [OKAY] ...... +[OKAY] +fused_adam .............fused_adam fused_adamfused_adam.............[NO] [NO] ....... ............. ....................[NO] [NO][OKAY][OKAY]....... + +.......[OKAY] + .............[NO]............. fused_lamb[NO] [NO] ....... .................... ....... [OKAY][NO] +[OKAY][OKAY]....... +ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY] + +cpu_adam ...............cpu_adamcpu_adam cpu_adam..............................[YES] ...... ...............[YES] [YES] [OKAY][YES] ...... + ............[OKAY] +[OKAY][OKAY] + +fused_lamb .............fused_lamb [NO] .................... [OKAY][NO] +[OKAY] +[OKAY]fused_lambfused_lamb +sparse_attnsparse_attnsparse_attnsparse_attn ........................ ............ ............ [NO][NO] [NO] .......[NO].............. [OKAY].......[OKAY][OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +[OKAY]fused_lamb +[OKAY][OKAY] +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op name +op nameop name op name ................................ ................ ................installedinstalled ..installedinstalled.. compatiblecompatible.... + +fused_adam .............fused_adam fused_adam[NO] fused_adam ................................. ............. [NO][OKAY] [NO] +sparse_attnsparse_attn ............................... [OKAY][NO][NO] + .............. [OKAY][OKAY] + +fused_adam .............fused_adamfused_adam [NO]..........................fused_adam .......[NO][NO]............. [OKAY].............. +[NO] [OKAY][OKAY]....... + + fused_lamb ............. .............fused_lamb ............. [NO][NO]............. [NO] ..............[NO] [OKAY].......[OKAY]....... + + [OKAY][OKAY] + + +[OKAY] +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +op nameop name + .............fused_lamb .............[NO] [NO]....... .......[OKAY] +sparse_attn[OKAY] sparse_attn + --------------------------------------------------compatible-------------------------------------------------- +compatible + + +-------------------------------------------------- +-------------------------------------------------- +[NO] fused_lamb .............. ....... .............[OKAY] [OKAY] +[OKAY][NO] + +transformertransformer sparse_attn........................ ............[NO]sparse_attn [NO] ............ [NO] ....... [NO].............. [OKAY][OKAY].......[OKAY] + + + [OKAY] + fused_lamb[OKAY] fused_lambfused_lamb............. + +transformertransformertransformertransformer ................................................ [NO][NO][NO] [NO] ....... ..................... [OKAY][OKAY][OKAY][OKAY] + + + + op name ................ op name................................installed ..................installedinstalled ....compatibleinstalled +............ ............[NO] [NO]....... .......[OKAY] +[OKAY]sparse_attn +cpu_adam ...............cpu_adam cpu_adam[YES]cpu_adam............... .....................[YES]............... [OKAY]......[YES] +[YES] ......[OKAY]...... + ....... fused_lamb[OKAY]fused_lambfused_lamb +transformer ............stochastic_transformertransformer stochastic_transformer[NO] ..................... [NO] [OKAY][NO] +[NO] ....... ....... ....... [OKAY] [OKAY]stochastic_transformer + + [OKAY] + ..........................[NO] [NO]fused_lamb[NO] ....... ....... ....................[OKAY] +[NO][OKAY][OKAY] + +....... [OKAY] +sparse_attn sparse_attn............ [NO]............sparse_attnsparse_attn ............[NO]................... [NO].......[NO][OKAY] .......[OKAY] + +stochastic_transformer stochastic_transformerstochastic_transformerstochastic_transformer . ..[NO] .[NO] [NO] ....... [NO]....... ....... [OKAY] ....... +[OKAY] [OKAY] + compatible--------------------------------------------------compatible +.. + +-------------------------------------------------- -------------------------------------------------- + + transformertransformer ............ sparse_attn........................ [NO][NO] ............[NO]....... ....... [NO] .......[OKAY] [OKAY][OKAY] + +....... +ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY] +[OKAY] + + [OKAY][OKAY] + + ....................................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] + + +ninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] + + +stochastic_transformer. [NO] ........ [OKAY][NO] +sparse_attn ............sparse_attn sparse_attn [NO] sparse_attn............ ............ .......[NO]............ [NO][OKAY].......[NO] +....... transformertransformer [OKAY] ............[OKAY] + ............ +[OKAY] + +compatible +-------------------------------------------------- +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] + +transformer stochastic_transformer[OKAY]............stochastic_transformer +-------------------------------------------------- + +----------------------------------------------------------------------------------------------------op name +-------------------------------------------------- + +fused_adam ............. [NO] .......fused_adam [OKAY]fused_adam.............fused_adam +sparse_attn ............ [NO] ....... [OKAY] +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +ninjaop nameop nameop name ................................................ installedinstalled installed .................... .. ..[OKAY]compatiblecompatible + + + ....... [OKAY] + .......[OKAY]....... + [OKAY]transformer[OKAY] + +[NO] transformer[NO]transformer....... ...............................[OKAY] +[NO][NO][OKAY] +cpu_adamcpu_adamcpu_adam ............... ..............................cpu_adam [YES] [YES][YES] ..................... [OKAY]............[YES] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------op name + + .[NO] transformer[NO] . ....... ................... [NO] [NO] [OKAY] [OKAY] +....... +op name................ op nameop name................installed .. ................installed................compatible + ..--------------------------------------------------installedinstalled + compatible.... + [NO].......................... .......[NO][NO]fused_lamb [OKAY].............. ............. +[OKAY][OKAY] + +transformer sparse_attnsparse_attn............sparse_attn [NO] ........................................... [OKAY] [NO][NO][NO] +compatible +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + +............ transformertransformer[NO] transformer ............ ................... ............ [NO] [NO] [OKAY] [NO]....... + ..............[OKAY] +[OKAY][OKAY]stochastic_transformer +ninjaninjaninja ninja...................................................... [OKAY] ..................[OKAY] +[OKAY][OKAY] + +..............stochastic_transformer [OKAY][OKAY]stochastic_transformer + + [OKAY][OKAY]...... + + [OKAY] + op nameop name................ op name ................installed................ ................installedinstalled .. ..installed .. compatiblecompatiblecompatible.. + + + ------------------------------------------------------------------------------------------------------------------------------------------------------compatible + + + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + .......[OKAY] +stochastic_transformer[OKAY] + --------------------------------------------------compatible + +compatiblecpu_adam +-------------------------------------------------- ............... +-------------------------------------------------- +[NO]fused_lamb fused_lamb....................fused_lamb .............[NO][OKAY]............. + ..................... [OKAY]stochastic_transformer +[OKAY] [OKAY] + +op namecpu_adam cpu_adam ................cpu_adam .............................. installed ............... [YES] .. [YES] [YES] ......compatible ...... +[OKAY] ......--------------------------------------------------[OKAY] + + +[OKAY] + +ninjaninjaninjaninja ...................................................... ..................[OKAY][OKAY][OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY] + + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- + +. [NO].stochastic_transformer stochastic_transformer ....... [NO] ..[OKAY] ....... +[NO] [NO] [OKAY] ....... +fused_adam ............. [NO] fused_adamfused_adam.......fused_adam ..........................[OKAY] [NO]............. +-------------------------------------------------- +ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] +[OKAY] +-------------------------------------------------- + + +---------------------------------------------------------------------------------------------------- + +---------------------------------------------------------------------------------------------------- +op nameop nameop name +. [NO]stochastic_transformer ....... .[OKAY] +cpu_adam[YES] ..................... [OKAY][YES]cpu_adam + cpu_adam..................... ...............[OKAY][YES] +[NO]....... [NO] ....... [OKAY] ....... +[OKAY] +[OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + +.transformer transformertransformer[NO]............ ....... ........................[NO][OKAY] +cpu_adam ...............fused_adamfused_adam [YES] fused_adam............. ............. ...... [NO] ............. [NO][OKAY] .......[NO] +stochastic_transformer .stochastic_transformer stochastic_transformer.[NO] .[NO]....... . [NO].......[OKAY] +.......[NO][OKAY] +[OKAY]....... + [OKAY] +-------------------------------------------------- + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op nameop nameop nameop name ................................................................ installedinstalledinstalled installed .. ...... compatiblecompatiblecompatible + +compatible +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +....... [OKAY][OKAY] + + [NO] ....... [NO] fused_lamb[OKAY]....... +....................[OKAY] +cpu_adamcpu_adamcpu_adam cpu_adam............................................. ...............[YES][YES][YES] ...... [YES][OKAY]............ +[OKAY] +---------------------------------------------------------------------------------------------------- + +op name + ................op name................................ installed................installedinstalled installed.... .. compatible..compatiblecompatible + +[NO] ....... [OKAY] +ninjaninja ninjaninja.................. ...................................................... [OKAY] [OKAY] + + [YES]...... ......[OKAY] +[OKAY]fused_adam +sparse_attn ............ [NO] ....... sparse_attn[OKAY] ............sparse_attn + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +--------------------------------------------------op nameop nameop name +[NO][NO]....... ..............[OKAY] [OKAY] + +[OKAY] + .......[OKAY]....... + [OKAY][OKAY] + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +op name op nameop nameop name................ ................................installed................ ..installedinstalledinstalled compatible...... +op nameop nameop name op name................ ................installed................ ................ installedinstalled .... installed.. compatible +-------------------------------------------------- + +[NO][OKAY]fused_lamb +fused_lamb ....... ............. .............[OKAY]fused_lamb[NO] + [OKAY][OKAY]...... + + [OKAY] + op name --------------------------------------------------................op name +................ installed................op nameinstalled installed .... ................ .. compatible + +--------------------------------------------------compatible-------------------------------------------------- + +-------------------------------------------------- + +[OKAY][OKAY]---------------------------------------------------------------------------------------------------- + + + + ............. [NO] .......fused_adam [OKAY]............. +ninjaninjaninjaninja .................................... .................. ..................[OKAY] +[OKAY][OKAY][OKAY]-------------------------------------------------- + +sparse_attn transformer[NO]........................ ...................[NO][NO] [OKAY] + [NO].............. transformer.......[OKAY] +................ ................................ op name installedinstalled installed ...................... compatibleinstalledcompatible + +compatible--------------------------------------------------.. +-------------------------------------------------- +-------------------------------------------------- + +compatible +-------------------------------------------------- +stochastic_transformerstochastic_transformerstochastic_transformer .. . [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] +[OKAY] +fused_lamb ............. fused_lamb[NO] .............fused_lambfused_adam....... [NO]..........................[OKAY] + .......[NO][NO] [OKAY].............. +op name op nameop nameop name................ ................................installed................ installedinstalled..installed compatible +...... -------------------------------------------------- +compatiblecompatiblecompatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + --------------------------------------------------compatiblecompatiblecompatible + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +ninjaninjaninja ninja .................. ..................[OKAY] .................. +compatiblecompatible..-------------------------------------------------- + + +--------------------------------------------------compatible-------------------------------------------------- + + +cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES]............... ............... ............... ......[YES][YES] ......[OKAY]...... [YES] +[OKAY][OKAY] + +...... [OKAY] + .................... [NO] [OKAY] [NO] +fused_adam ............. [NO] ....... fused_adamfused_adam[OKAY]fused_adam +compatiblecompatible--------------------------------------------------installed + + + ----------------------------------------------------------------------------------------------------.. + +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] + + + +-------------------------------------------------- +----------------------------------------------------------------------------------------------------op name +op name + op name................op name ................ installed................ ................ installed installed .. installed .... compatible ..compatible + [NO]fused_adam fused_adam....................fused_lamb [NO][OKAY].......................... + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +op name + + ............ [OKAY] [OKAY]transformer + [NO] +cpu_adamcpu_adam ...............cpu_adam...............cpu_adam [YES].............................. [YES] ......[YES] ......[YES]......[OKAY] ......[OKAY] + +[OKAY] + + [OKAY][OKAY] + + + +cpu_adam ...............cpu_adamcpu_adamcpu_adam [YES] ................................................... [YES][OKAY][YES] + .................. --------------------------------------------------[OKAY] [OKAY] + +[OKAY] +-------------------------------------------------- +fused_adam .............fused_adamfused_adam fused_adam.............[NO] [NO]....... .......................... [OKAY] +....... .......[OKAY] +[OKAY] + ............. ..........................[NO] fused_lamb [NO] [NO]....... ............. .............. [OKAY] [OKAY][NO] + +[OKAY] + compatible +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + +op name +cpu_adamcpu_adam cpu_adam cpu_adam ............................................. ............... [YES][YES] [YES] [YES].................. ......[OKAY][OKAY][OKAY] + +[OKAY] +compatible +compatible-------------------------------------------------- + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + + .......[NO][NO] [OKAY].......fused_lamb....... + [OKAY][OKAY]............. + + op nameop name................op name ................ installed................................ installed..installed installed .. compatible ....compatible + + compatible---------------------------------------------------------------------------------------------------- +compatible + +............ transformer ....... stochastic_transformer[NO]............[OKAY] + .......[NO] . [OKAY].......[NO]stochastic_transformer +[OKAY] +ninjaninjaninja ninja...................................................... ..................[OKAY][OKAY] [OKAY] + +[OKAY] +sparse_attn fused_lamb............ .............[NO] sparse_attn [NO] ....... sparse_attn................... ............[OKAY][OKAY] [NO] + +cpu_adam ............... [YES] cpu_adamcpu_adam...... cpu_adam ...............[OKAY] ............... ............... +[YES] ...... ...... ...... [OKAY] [OKAY] +[OKAY] + +op name +----------------------------------------------------------------------------------------------------................ + + --------------------------------------------------op nameinstalledop name +cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ............... ..................... ............... [YES][YES][OKAY][YES] +.......[NO][NO] fused_lamb[OKAY].............. +............. [OKAY] [OKAY][NO] +fused_lamb +sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] +....... fused_lamb[OKAY]fused_lamb fused_lamb +cpu_adam cpu_adam............... cpu_adam...............[YES] ...............cpu_adam......[YES] [YES][OKAY]...... + op nameop name................ op name ................................ installed installed ................installed .... .. installedcompatible compatible +compatible +..-------------------------------------------------- + +-------------------------------------------------- +--------------------------------------------------compatible + +-------------------------------------------------- + +cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES] ................................................... [YES] [YES][OKAY] [YES] +fused_lamb [NO] .............fused_lamb....... [NO].............[OKAY] + +-------------------------------------------------- +-------------------------------------------------- + [OKAY]....... +fused_adam fused_adam.............fused_adamfused_adam ............. [NO]............. .............[NO] .......[NO] [OKAY][NO].............. +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +-------------------------------------------------- +op name +[NO] .............. [OKAY][OKAY] + +[YES] [YES] [YES] ...... ...... ...... [OKAY][OKAY] + +[OKAY] +fused_adam ............. fused_adamfused_adamfused_adam[NO] .......................... ............. [NO] [NO] [NO]....... ....... .......[OKAY][OKAY] ....... + ................op name .. ................ ................installed compatible + installed..--------------------------------------------------installed +............ ......[OKAY][OKAY] + +[OKAY] + .................... fused_lamb[OKAY]fused_lamb[NO] +sparse_attn sparse_attn transformer....... ............ ............ [OKAY]............ + ............. ..........................[NO] [NO] [NO]....... ....... ....... [OKAY] [OKAY]sparse_attn + +.....................[OKAY] + [YES][OKAY] +cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam .............................. ...... ...............[YES] [YES][OKAY] +fused_adamfused_adam fused_adam fused_adam............. ............. [NO][NO]............. ............. ....... ....... [NO][OKAY] [NO] +[OKAY] ....... + .................. [OKAY][OKAY][OKAY] + + +.......[NO]sparse_attn [OKAY] ................... +cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES][YES] ............... ............... ...... [YES] [OKAY] ......[YES] +...... [OKAY]......[OKAY] +stochastic_transformer . [OKAY]stochastic_transformer[NO] + [OKAY].......[OKAY] +fused_lamb + .............[OKAY] +op name op nameop name ................ ................ ................................installed installedinstalled..installed ..compatible.. compatible.. +transformer ............ transformer[NO]transformer ................... sparse_attn............[NO][OKAY] ............[NO]....... +fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam + + [OKAY][OKAY] + + compatible.... +fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam + ................................. [NO][OKAY][NO] + .............. [OKAY][OKAY] + + [NO][NO][NO] transformer ....... .......................... [NO][OKAY][OKAY][OKAY] + +[OKAY] +............ [NO] ....... [OKAY] +...... [OKAY] +[YES] .................. [OKAY][OKAY] +[OKAY] + + .......fused_lamb[OKAY] + .............[OKAY]fused_lamb +fused_adam ............. [NO] ....... [OKAY]fused_adam + [OKAY][NO] + sparse_attn....... ............[OKAY] + +[OKAY] +. ........[NO] [OKAY] +[NO]....... .......[OKAY] +[OKAY] +fused_lamb[NO]fused_lamb fused_lamb....... ....................................... [NO][OKAY][NO] [NO] +compatible + -------------------------------------------------- +-------------------------------------------------- +compatible +-------------------------------------------------- + +-------------------------------------------------- + .......[NO][OKAY] stochastic_transformer +.......[OKAY] +[OKAY].stochastic_transformer + fused_adam............. ............. ............. fused_lamb[NO][NO] ........................... [NO] [OKAY][NO][OKAY]....... + +fused_lamb .............fused_lamb fused_lamb fused_lamb [NO] ............. ............. .................... [NO] [NO] [NO][OKAY] ....... + --------------------------------------------------compatiblecpu_adamcompatible + + + fused_adam.............fused_lamb............. ..........................[NO][NO] .......[NO] ....... [NO] [OKAY] [OKAY] +....... +sparse_attn ............ [NO] ....... [OKAY]sparse_attn + +.......transformertransformer [OKAY] ............ +transformer ............ sparse_attn[NO]sparse_attnsparse_attn ............ ............................... [NO] [OKAY] [NO][NO] +fused_adam ............. [NO]fused_adam .................... [OKAY][NO] +fused_adam fused_adam.............fused_adam fused_adam [NO] .......................... ............. [NO][NO][NO] ..................... ....... [OKAY][OKAY] + [NO]............. fused_lamb .......fused_lamb [NO]............. [OKAY] +....................[NO] [NO][OKAY] +fused_adamfused_adam fused_lamb....................................... .............[NO][NO][NO] [NO].............. ....... [OKAY].......[OKAY][OKAY] + + +[NO] ....... transformer[OKAY] +fused_adam ............. [NO]fused_adam ....................fused_adamfused_adam [OKAY][NO].......................... + ....... ....... ....... [OKAY][OKAY] +[OKAY] + +cpu_adamcpu_adam cpu_adam .............................. [YES]cpu_adam...............[YES] ...... [YES][OKAY]..................... + [NO]stochastic_transformer . transformer....... ............. [NO] [OKAY] [NO] + ....... [OKAY]fused_lamb[OKAY] +fused_lamb +....... .......[OKAY] +[OKAY][OKAY] + +...............-------------------------------------------------- -------------------------------------------------- +[YES] + cpu_adam...... ...............[OKAY] +....... [OKAY]fused_lambfused_lamb +[OKAY] + ............ transformer[NO] sparse_attnsparse_attn ............ .......[NO]........................ .......[OKAY] [NO] +............ [NO]stochastic_transformer[NO] stochastic_transformer............... [NO] [OKAY]. +.......[OKAY] +[OKAY]stochastic_transformer[NO] +....... [OKAY]..............stochastic_transformer + [OKAY][OKAY] + + fused_adam....... fused_adam.............fused_lamb[OKAY] +[OKAY][OKAY] + + +.............. [OKAY][OKAY] + +[OKAY] +............sparse_attn transformer [NO]sparse_attn ............................... ............[NO][OKAY] +.......[NO][NO] [OKAY]fused_lamb +sparse_attn ............ [NO] ....... [OKAY] + ......[OKAY][YES] + ......[OKAY] +[OKAY] +[NO] ....... ....... ....... [OKAY] [OKAY] + +[OKAY] + .......................... [NO][NO]fused_lamb ........................... [OKAY][OKAY][NO] + + ....... [OKAY] +sparse_attn ............ [NO] sparse_attn....... sparse_attn ............ [OKAY] sparse_attn............ + [YES] ......cpu_adam [OKAY]cpu_adam............... +.......................... [NO][NO] fused_lamb.............. .............[OKAY][OKAY] sparse_attn + +[NO] [OKAY]transformer....... + .......[OKAY]............ +stochastic_transformer ....... .[OKAY]. +.transformer transformer............[NO]transformer ............[NO]................... [NO] .......[OKAY] [NO] ....... +[OKAY] +.......[OKAY] +.............[NO]............. fused_lamb ....... [NO] [NO][OKAY].................... +fused_lamb .............fused_lambfused_lamb fused_lamb .............[NO] ..........................[NO] [NO][NO]....... ..............[OKAY] .......[OKAY] + +sparse_attn ............ [NO] ....... [OKAY]sparse_attn +fused_lambfused_lamb fused_lamb............. .............[NO]............. [NO].......[NO] .......[OKAY].......sparse_attn +[OKAY] ............ +[NO] [NO].............. stochastic_transformer .......[OKAY][OKAY] + +[OKAY]. +.............. .............fused_lamb[OKAY][OKAY] +[NO] +............. .......[NO] fused_lamb [OKAY] .......fused_lamb +sparse_attnsparse_attn transformersparse_attn............ ............ ............ [NO]............ [NO] [NO].......[NO] ....... ....... [OKAY] ....... +[OKAY] [OKAY] + +[OKAY] +fused_adam ............. [NO] ....... [OKAY]fused_adamfused_adam +stochastic_transformer . [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY]sparse_attn +[NO] ............[NO]transformer....... [OKAY][NO]............ + ...............[YES] fused_adam[YES] ......................... fused_adam [OKAY][NO] [OKAY] + [NO]............ [NO]....... ....... [OKAY][OKAY] + + stochastic_transformer[OKAY][NO] transformer + . ....... ............transformer [NO] [OKAY]...................[NO] + [NO][NO] .............. [OKAY][OKAY] + +[OKAY] + .......[NO][OKAY]fused_lamb ....... + [OKAY]............. + [OKAY][NO] +[OKAY][OKAY] + + sparse_attn............sparse_attn transformer ........................ [NO] ............[NO][NO] ....... ....... [NO]....... [OKAY] [OKAY] +.......[OKAY] +[OKAY] +[NO] ....... [OKAY] + transformer[NO] stochastic_transformertransformer ............ ....... ............. [NO][OKAY] [NO] + ............. [OKAY] ............. +transformertransformertransformer stochastic_transformer ............ ............ ............[NO] [NO]. [NO] ..............[NO] .......[OKAY]....... + [OKAY] [OKAY] +[OKAY] + fused_adam.......................... fused_lamb............. [NO] [NO] .............[NO] [NO]....... .............. ....... [OKAY][OKAY] +[OKAY] +sparse_attn ............transformer sparse_attn............ [NO] ............ ............[NO][NO]....... .......[NO].......[OKAY] +.......[OKAY][OKAY] + +[OKAY]transformer +....... ....... transformer[NO] [OKAY]............[OKAY] .......[NO] + + [OKAY]....... + [OKAY]transformertransformer + +.................... [NO][OKAY] +transformersparse_attnsparse_attn .................................... [NO][NO][NO] sparse_attn....... ....... ....... ............[OKAY][OKAY] + [OKAY] + [OKAY][NO]....... + stochastic_transformer ....... [OKAY] +stochastic_transformer stochastic_transformer. stochastic_transformer .[NO] . [NO] ....... [NO] ....... [OKAY] ....... +[OKAY] +[OKAY] + ....... [OKAY]fused_lamb +sparse_attnsparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY]sparse_attn ............ +[OKAY]............[NO] + +transformer[OKAY] transformer +transformer ............sparse_attn [NO]sparse_attn ................... sparse_attn [NO] ............[OKAY] ............ + .......[NO][NO] stochastic_transformer[OKAY].............. +[NO] ....... ....... ....... [OKAY][OKAY] + +[OKAY] +[NO] [NO]....... .......[OKAY] +[OKAY] +stochastic_transformer +[OKAY] + + transformer............stochastic_transformer transformer ............[NO] ............. [NO] .......[NO] [NO] ....... .......[OKAY] ....... [OKAY] +[OKAY] + +[OKAY] + ............stochastic_transformer............ [NO]stochastic_transformer[NO] ............... .[NO] [OKAY][NO][OKAY] + +....... .......[OKAY] +[OKAY]stochastic_transformer +....... [OKAY]fused_lamb +[NO] +[OKAY]. + ............. sparse_attn[NO] ............ .......sparse_attn[NO] [OKAY]................... +sparse_attn [NO] [OKAY] ............ + transformer [NO]transformer ....... ....... ............ ............[OKAY] [OKAY] + [NO] +............transformer ............[NO]............stochastic_transformer [NO][NO]........ ....... [OKAY]....... [NO] + [OKAY] [OKAY] +....... + [OKAY].[OKAY] + transformer +stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... +sparse_attn sparse_attn............ ............[NO] [NO]....... .......sparse_attnsparse_attn[OKAY] +[OKAY]........................ + stochastic_transformer.stochastic_transformer [NO] ......... [NO][NO][OKAY] +fused_lambfused_lambfused_lamb ....................................... [NO][NO][NO] .......sparse_attn.............. [OKAY][OKAY][OKAY]............ +stochastic_transformerstochastic_transformer stochastic_transformer .. .[NO][NO] [NO].............. .......[OKAY][OKAY] + +[OKAY] +stochastic_transformer .. [NO][NO] ....... .......[OKAY] +[OKAY] + ............. [NO]fused_adam fused_adamfused_lamb....... ..........................[OKAY] ............. + stochastic_transformer.......transformer transformer ............[OKAY] ............. +[NO][NO][NO] ..............transformer ....... [OKAY][OKAY]............ + + stochastic_transformer[NO] stochastic_transformer........ [OKAY]. +....... [NO] [OKAY]transformer....... + ............[OKAY] +[NO] .......transformer....... transformer [OKAY][OKAY] + ............ + [OKAY]stochastic_transformer +[NO]transformer transformer................... ............ ............[NO] [OKAY] .......[NO] + [OKAY] + transformer[NO]transformer[NO] ............ ............[NO].............. [NO] .......[OKAY][OKAY] + .............. [OKAY][OKAY] + + + +[NO] ....... [OKAY] +[NO] [NO][NO]....... ..............[OKAY] +[OKAY][OKAY] + [OKAY][NO] +[NO] [NO]....... .......[OKAY] +[OKAY] +[NO]transformer ...................transformer [OKAY]sparse_attn[NO] +............ [NO]stochastic_transformer[NO] stochastic_transformer.............. .[OKAY] + stochastic_transformerstochastic_transformer . .[NO]. [NO].......[NO] ..............[OKAY] +[OKAY][OKAY] + + [NO][OKAY]....... +....... [OKAY][OKAY] + + +.......[OKAY]transformer + [OKAY]transformer............ +transformer ............ [NO] sparse_attn.......sparse_attnsparse_attn [OKAY].................................... +fused_lamb +stochastic_transformer ....... .stochastic_transformer [OKAY] [NO] + ........ [OKAY][NO] +............ .......[NO]............ [OKAY].......stochastic_transformer[NO] + [OKAY]. + .[OKAY][NO] +stochastic_transformer [NO] ....... .......stochastic_transformer[OKAY] . . +stochastic_transformer .stochastic_transformer stochastic_transformer[NO] ........ . [NO][NO] [OKAY] ....... + stochastic_transformer............[NO] stochastic_transformer[NO]. ....... ........ [NO] [NO][OKAY] [OKAY] +....... + [NO][NO][NO]stochastic_transformer ....... ..............[OKAY]. + sparse_attn............. ............fused_lamb[NO] .............[NO].......sparse_attn [NO] .......[OKAY] ............ + [OKAY][NO] +stochastic_transformer ....... [OKAY]. + [NO] ....... [OKAY] +....... stochastic_transformer[NO] stochastic_transformer [OKAY]........ . [NO] + [OKAY] [NO]....... + [OKAY][NO] +....... [OKAY][OKAY] + +....... [OKAY][OKAY]stochastic_transformer + + [OKAY][OKAY][NO] + +....... transformer....... [OKAY][OKAY]............ + + ....... [OKAY]transformer[OKAY] + + ............ [NO] ....... [OKAY] + [NO]....... .......[OKAY] +[OKAY] +stochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + + transformer....... transformer ............transformer [OKAY] [NO]........................ + [NO].......[NO] [OKAY].............. + [NO] .......transformersparse_attn ............[OKAY]............ +stochastic_transformer . [NO] ....... [OKAY] + [OKAY][OKAY] + + [NO] [NO]....... sparse_attnstochastic_transformer.......[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +stochastic_transformer stochastic_transformerstochastic_transformer . ..[NO] [NO][NO]....... ..............[OKAY] +[OKAY][OKAY] + + ............[OKAY] .stochastic_transformer [NO] +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at +[NO] ............... transformer [OKAY] [OKAY] +[NO] + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + ................... transformer[OKAY][NO] + + +JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja + ................... [NO][OKAY] +....... [OKAY] +stochastic_transformer stochastic_transformer. .[NO] [NO]....... .......[OKAY] +[OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] + +[OKAY] +---------------------------------------------------------------------------------------------------- + + +----------------------------------------------------------------------------------------------------op nameop name + + op name................................op name installed................................installed ..installed.. installedcompatible + --------------------------------------------------compatible.. +.. + --------------------------------------------------compatiblecompatible + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adam ............... [YES] cpu_adam...... ...............cpu_adam [OKAY][YES] +cpu_adam .................................... [OKAY][YES][YES] + ............fused_adam [OKAY][OKAY]............. + +[NO] ....... fused_adam[OKAY] +............. [NO]fused_lamb ....... fused_adam fused_adam.............[OKAY] + [NO].......................... .......[NO][NO] fused_lamb [OKAY]........................... + [OKAY] [NO] + [OKAY]....... + [OKAY]fused_lamb + .............fused_lamb [NO]............. sparse_attn[NO] ....... ............ ....... [OKAY] [NO] +sparse_attn [OKAY] ....... + ............[OKAY] +[NO] .......transformer [OKAY]............ + [NO]sparse_attn transformer....... ............sparse_attn............[OKAY] +[NO] ............[NO]stochastic_transformer....... .......[NO] .[OKAY] [OKAY] + ....... +[NO] transformer [OKAY] ....... +............stochastic_transformer [OKAY]transformer[NO] +. ....... ............ [NO] [OKAY][NO]....... + .......[OKAY] +stochastic_transformer[OKAY] +. [NO]stochastic_transformer ........ [OKAY][NO] + ....... [OKAY] +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] +[OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +op name +op name op name ................op name ................ ................ installed ................ installedinstalled .. ..installedcompatible .. +compatible.. +-------------------------------------------------- compatible-------------------------------------------------- + +compatible + +-------------------------------------------------- +-------------------------------------------------- +cpu_adam cpu_adam............... cpu_adam...............[YES]cpu_adam [YES].................................... ...... [YES] [YES][OKAY][OKAY] + + ............ [OKAY][OKAY] + +fused_adamfused_adam .......................... [NO][NO] fused_adam .......fused_adam ....... .............[OKAY].............[OKAY] + +[NO][NO] .............. fused_lamb fused_lamb [OKAY][OKAY]............. + + .............[NO] fused_lamb[NO] .......fused_lamb.................... [OKAY].............[OKAY][NO] + + [NO]....... .......[OKAY] +[OKAY] +sparse_attnsparse_attn ........................ [NO][NO] ..............sparse_attn sparse_attn ............[OKAY][OKAY] ............ +[NO] + [NO]....... transformer .......transformer[OKAY] +[OKAY] ............ +............ transformer [NO]transformer[NO] .......................... ............ [NO][OKAY] [OKAY] +[NO] + .............. [OKAY]stochastic_transformer[OKAY] + stochastic_transformer + . stochastic_transformer.[NO]stochastic_transformer [NO]....... . ........[OKAY] [OKAY] +[NO][NO] + .............. [OKAY][OKAY] + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninja ...................................................... ninja [OKAY][OKAY]..................[OKAY] + + + ------------------------------------------------------------------------------------------------------------------------------------------------------ +[OKAY] + + +op nameop name op name --------------------------------------------------................ ................ + ................ installed op nameinstalled installed.................... ..installedcompatiblecompatible + +compatible--------------------------------------------------..-------------------------------------------------- + + +compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adam cpu_adam............... ...............cpu_adam[YES] [YES]cpu_adam .......................................... [OKAY][OKAY] [YES] + +[YES] ............ [OKAY][OKAY] + +fused_adamfused_adam ..........................fused_adam fused_adam[NO] [NO] .......................... .............. [NO] [NO][OKAY] [OKAY] + ....... +....... fused_lamb[OKAY][OKAY] + +fused_lamb............. fused_lamb[NO]fused_lamb .......................... ....... ............. [NO][NO][OKAY][NO] + ..................... [OKAY][OKAY][OKAY] + + +sparse_attn ............ [NO]sparse_attn sparse_attn....... sparse_attn ........................[OKAY]............ [NO][NO] +[NO] ..............transformer ....... ............[OKAY][OKAY] + +[NO][OKAY] +.......transformertransformer [OKAY]transformer............ +............ ............[NO][NO] stochastic_transformer [NO] .............. ........[OKAY][OKAY] +[NO] +[OKAY] +....... stochastic_transformer[OKAY]stochastic_transformer stochastic_transformer + . ..[NO] [NO][NO]....... ..............[OKAY] +[OKAY][OKAY] + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja +-------------------------------------------------- + + +JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................. ....................................[OKAY] [OKAY] + +[OKAY][OKAY]-------------------------------------------------- + + +----------------------------------------------------------------------------------------------------op name +-------------------------------------------------- + op name +................op name op name................ installed ................................ installed .. compatibleinstalledinstalled + .. ..-------------------------------------------------- ..compatiblecompatible + + +compatible +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES] cpu_adam cpu_adam...............cpu_adam ...... [YES]............... ............... ...... [YES] [YES][OKAY]...... [OKAY] ...... +[OKAY] + +[OKAY] +fused_adamfused_adam fused_adam ............. fused_adam.............[NO]............. ....................[NO] [NO] [NO][OKAY]....... ....... + ....... [OKAY] fused_lamb +[OKAY][OKAY] + +.............fused_lamb [NO]fused_lambfused_lamb ............. ....... .............[OKAY] .............[NO] +[NO]....... [NO] ....... [OKAY] +.......[OKAY] +[OKAY] +sparse_attn ............ [NO] ....... [OKAY]sparse_attn + ............transformersparse_attn [NO]........................ [NO].......[NO] sparse_attn ....... [OKAY]....... ............ + [OKAY][OKAY] + transformer +[NO] transformer............stochastic_transformer ....... ............[NO]. [OKAY] [NO][NO] +....... ....... ....... [OKAY][OKAY] + +transformer[OKAY] stochastic_transformer +............ [NO]. stochastic_transformer ....... [NO] . ....... [OKAY][NO] + [OKAY]....... + [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja ...................................................... ..................[OKAY] [OKAY] + +[OKAY][OKAY]-------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op name +op name-------------------------------------------------- op name + ................ ................ op name................ installed installed................ ....installedinstalled compatible..compatible +.. + --------------------------------------------------compatible-------------------------------------------------- + +compatible +-------------------------------------------------- + +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam............... cpu_adam............... ............... [YES]............... [YES] ...... [YES][YES] ...... [OKAY]............[OKAY] + +[OKAY][OKAY] + +fused_adam .............fused_adam [NO]fused_adam.............fused_adam ....... ............. [NO] [OKAY].............[NO]....... + [NO].......[OKAY] +.......fused_lamb[OKAY] +[OKAY]fused_lamb............. + fused_lamb.............[NO] ....................fused_lamb[NO] .......[NO] [OKAY] ............. +[OKAY] ....... +[NO] [OKAY]....... + [OKAY] +sparse_attn sparse_attn............ ............[NO] [NO]....... sparse_attn.......[OKAY]sparse_attn + [OKAY]........................ + [NO][NO]transformer transformer.......................... [OKAY][NO]............ +[OKAY] +.......[NO] transformer[OKAY].......transformer + ............ [OKAY]............ + [NO]stochastic_transformer[NO] ........stochastic_transformer ....... [OKAY].[OKAY][NO] + ....... + stochastic_transformer[OKAY][NO] + stochastic_transformer....... . [OKAY]. +[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninja-------------------------------------------------- +-------------------------------------------------- + +JIT compiled ops requires ninja +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] +[OKAY][OKAY] +-------------------------------------------------- + + +---------------------------------------------------------------------------------------------------- +op name + --------------------------------------------------op name................op name + ................................installedop name ..installed................ installed compatible installed.. + .. ..-------------------------------------------------- +compatible compatible +compatible + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [YES] ...... [OKAY]cpu_adamcpu_adam +cpu_adam ............................................. [YES][YES][YES] ...... ......fused_adam ...... [OKAY][OKAY] ............. + +[OKAY][NO] +....... [OKAY] +fused_lamb fused_adam............. fused_adam fused_adam............. [NO] ............. .............[NO] ....... [NO][NO] .......[OKAY] ....... +.......[OKAY][OKAY] + +[OKAY] +fused_lambfused_lamb fused_lamb.......................... ............. [NO]sparse_attn [NO] [NO].......................... ....... [NO][OKAY] +[OKAY][OKAY]....... + + [OKAY] +transformer ............ [NO] sparse_attn....... ............[OKAY]sparse_attn +sparse_attn [NO]........................stochastic_transformer .......[NO][NO] .[OKAY].............. +[NO] [OKAY] [OKAY]transformer +....... + transformer............[OKAY] +............transformer[NO] [NO]................... .......[NO][OKAY] +[OKAY]....... + [OKAY]stochastic_transformer + stochastic_transformer . stochastic_transformer.[NO] .[NO]....... [NO][OKAY]....... + .......[OKAY] +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] + +[OKAY]--------------------------------------------------[OKAY] +-------------------------------------------------- + + +--------------------------------------------------op nameop name +-------------------------------------------------- +op name................................ op name installed................ installed ................ ..installed ..compatibleinstalled.. + ..compatible-------------------------------------------------- +compatible + +--------------------------------------------------compatible + +-------------------------------------------------- +-------------------------------------------------- +cpu_adamcpu_adamcpu_adamcpu_adam ............................................................ [YES][YES][YES][YES] ...... ......[OKAY]............ + [OKAY][OKAY][OKAY] + + +fused_adam .............fused_adam fused_adamfused_adam [NO] ............. ............. ....... ............. [NO][NO][NO] [OKAY] ....... ....... + ....... [OKAY][OKAY] +fused_lamb +[OKAY] +............. [NO] fused_lamb.......fused_lambfused_lamb ..........................[OKAY]............. + [NO][NO][NO] ....... .............. [OKAY][OKAY][OKAY] + + +sparse_attn ............ [NO] ....... [OKAY] +transformersparse_attnsparse_attn ........................ sparse_attn............ [NO] [NO] [NO]....... .......[OKAY]................... + [OKAY][OKAY] + +stochastic_transformer[NO] transformertransformer....... .........................[OKAY] + [NO][NO][NO]transformer .............. ....... ............[OKAY] [OKAY] +[NO][OKAY] + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + +-------------------------------------------------- +.......stochastic_transformer stochastic_transformer [OKAY] +.. [NO] [NO]....... .......[OKAY]stochastic_transformer + [OKAY] +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------JIT compiled ops requires ninja + + +. [NO] ....... [OKAY] +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninja ....................................ninja .................. [OKAY] [OKAY] + .................. + --------------------------------------------------[OKAY][OKAY]-------------------------------------------------- + + + +op name-------------------------------------------------- --------------------------------------------------op name................ + + ................installedop nameop name .................. installed compatible................ installed + .. --------------------------------------------------..installed + compatiblecompatible + +..---------------------------------------------------------------------------------------------------- + +cpu_adamcompatible +............... [YES]-------------------------------------------------- +...... cpu_adam[OKAY] cpu_adam +............... ............... [YES][YES] ............cpu_adam fused_adam[OKAY][OKAY]............... + + [YES]............. [NO]...... ....... [OKAY][OKAY] + +fused_adamfused_adamfused_lamb ............. ..........................[NO] [NO][NO]....... .......[OKAY].......fused_adam + [OKAY].............[OKAY]fused_lamb + + .............[NO]fused_lamb [NO] .................... ....... [NO] [OKAY] [OKAY]sparse_attn....... + + ............[OKAY] +fused_lamb[NO] .................... [OKAY] +[NO] transformer....... ............[OKAY] sparse_attn[NO] + sparse_attn................... ............[NO][OKAY] + [NO]....... stochastic_transformer [OKAY] ....... + .[OKAY] +transformer[NO]sparse_attn ...................transformer ............ [NO] [OKAY]............[NO]....... + [OKAY][NO]....... + .......[OKAY] [OKAY] +stochastic_transformer + transformer. stochastic_transformer[NO] ............ ........ [NO][OKAY] [NO] +....... ....... [OKAY][OKAY] + +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +[NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_iotransformer_inference ................. [NO][NO] .............. [OKAY][NO] + +async_io ............... utils[NO] ......................... [YES][NO] +...... transformer_inference[OKAY] +.. [NO] .......quantizer [OKAY].............. + [NO] ....... [OKAY]utilstransformer_inference + .................... [YES][NO] --------------------------------------------------............. + [OKAY][OKAY] + +quantizer .............. [NO]utils ......................... [OKAY][YES] + ...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + ............... [NO] ....... [NO] +async_io ............... [NO] .......transformer_inference [NO].. + [NO] ....... [OKAY] +utils .................. [YES]transformer_inference ........ [OKAY][NO] + ....... quantizer[OKAY] +.............. [NO] ....... [OKAY] +utils .................. [YES] --------------------------------------------------...... + [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference async_io.. [NO] ...................... [NO][OKAY] +....... [NO] +utils .................. [YES] ...... [OKAY] +transformer_inferencequantizer ................ [NO][NO] .............. [OKAY][OKAY] + +--------------------------------------------------utils + .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io + ............... [NO] ....... [NO] +async_io [WARNING]  async_io: please install the libaio-devel package with yum ............... [NO] +.......transformer_inference [NO].. + [NO] ....... [OKAY] +utils .................. transformer_inference[YES] ........ [OKAY][NO] + ....... [OKAY]quantizer + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + .............. [NO] .......utils [OKAY].................. + [YES] ...... [OKAY]-------------------------------------------------- + +async_io ............... [NO] ....... [NO] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +async_io ............... [NO] ....... [NO] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io +transformer_inference .. [NO] ....... [OKAY] + ............... [NO] ....... [NO] +async_io ............... [NO]transformer_inference ......... [NO][NO] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + ....... [OKAY] +-------------------------------------------------- +utilstransformer_inference .................... [YES][NO] ............. [OKAY][OKAY] + +quantizer .............. utils[NO] ......................... [YES][OKAY] +...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io: please install the libaio-devel package with yum +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference utils.. ..................[NO] [YES]....... ......[OKAY] +[OKAY] +quantizer ..............utils [NO].................. .......[YES] [OKAY]...... + [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +utils .................. [YES] ...... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference ..utils [NO].................. .......[YES] ......[OKAY] +[OKAY] +quantizer .............. [NO] ....... [OKAY] +quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] +[OKAY] +-------------------------------------------------- +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +............... + [NO] ....... [NO] +async_ioasync_io ............... [NO]transformer_inference............... .........[NO] [NO].......[NO] +.......[NO] +[OKAY] +utils .................. [YES] ...... transformer_inference[OKAY] +transformer_inference.. ..[NO]quantizer [NO]..................... .......[OKAY][NO] + [OKAY]....... + [OKAY] +utils utils..................-------------------------------------------------- +..................[YES] [YES]...... ......[OKAY] +[OKAY] +quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io: please install the libaio-devel package with yum +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io ............... [NO] ....... [NO] +utils .................. [YES] ...... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... async_io[NO] + ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO] .......utils [OKAY].................. + [YES] ...... [OKAY] +utils .................. [YES]quantizer .................... [OKAY][NO] + ....... [OKAY]quantizer + .............. [NO] --------------------------------------------------....... + [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +DeepSpeed general environment info: +transformer_inference .. [NO] ....... [OKAY] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +utils .................. [YES] ...... [OKAY] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io async_io............... [NO]............... .......[NO] [NO]....... + [NO] +transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] +[OKAY] +utils ..................utils [YES].................. ......[YES] [OKAY]...... + [OKAY] +quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io + ............... [NO] ....... [NO] +async_io ............... [NO] .......transformer_inference [NO].. + [NO] ....... [OKAY] +utils transformer_inference.................. ..[YES] [NO]...... .......[OKAY] +[OKAY] +quantizer .............. utils[NO] ......................... [YES][OKAY] +...... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +[NO] ....... [NO] +async_io transformer_inference............... ..[NO] [NO]....... .......[NO] +[OKAY] +utils .................. [YES] ...... [OKAY]transformer_inference + .. quantizer[NO] ..................... [NO][OKAY] +....... [OKAY] +utils --------------------------------------------------.................. + [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +utils .................. [YES] ...... [OKAY] +quantizer async_io.............. [NO]............... .......[NO] [OKAY]....... + [NO] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] + +quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO]utils ......................... [OKAY][YES] + ...... [OKAY] +utils quantizer.................. ..............[YES] [NO]...... .......[OKAY] +[OKAY] +quantizer ..............-------------------------------------------------- +[NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] + +quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io async_io............... [NO]............... .......[NO] [NO]....... +[NO] +transformer_inference .. [NO] transformer_inference....... ..[OKAY] +[NO] ....... [OKAY] +utils .................. [YES] ......utils [OKAY].................. + [YES] ......quantizer [OKAY].............. + [NO] .......quantizer [OKAY].............. + [NO] .......-------------------------------------------------- +[OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +utils .................. [YES] ...... [OKAY] +async_ioquantizer ............................. [NO] [NO]....... ....... [NO] +[OKAY] +transformer_inference --------------------------------------------------.. + [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +....... [NO] +async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] + +utils .................. [YES] ......transformer_inference [OKAY].. + [NO] ....... [OKAY]quantizer + .............. [NO] ....... utils[OKAY] +.................. [YES] ......-------------------------------------------------- +[OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch version torch version.................... ....................1.8.1 +1.8.1 +torch cuda version torch cuda version............... ...............11.1 +11.1nvcc version + nvcc version..................... .....................11.2 +11.2deepspeed install path + deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed info + deepspeed info................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +DeepSpeed general environment info: +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch versiontorch version ........................................ 1.8.11.8.1 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +torch cuda version torch cuda version............... ...............11.1 +11.1 +nvcc version nvcc version..................... .....................11.2 +11.2deepspeed install path + deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed info + ...................deepspeed info 0.5.5+cd7967d, cd7967d, master................... + 0.5.5+cd7967d, cd7967d, masterdeepspeed wheel compiled w. +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +......deepspeed wheel compiled w. DeepSpeed general environment info:torch 1.8, cuda 11.1 +...... +torch 1.8, cuda 11.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +DeepSpeed general environment info: +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +DeepSpeed general environment info: +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... DeepSpeed general environment info:['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed info ................... 0.5.5+cd7967d, cd7967d, mastertorch install path +deepspeed wheel compiled w. ..................... torch 1.8, cuda 11.1 +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +DeepSpeed general environment info: +torch version .................... 1.8.1 +DeepSpeed general environment info: +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +nvcc version ..................... 11.2 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +torch version .................... 1.8.1 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch cuda version ............... 11.1 +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +DeepSpeed general environment info: +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +DeepSpeed general environment info:DeepSpeed general environment info: + +torch version .................... 1.8.1 +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +torch version .................... 1.8.1 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +torch versiontorch cuda version ................................... 1.8.111.1 + +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +nvcc version torch cuda version..................... ...............11.2 +11.1deepspeed install path +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']nvcc version + .....................deepspeed info 11.2................... + 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version ....................torch version 1.8.1.................... + 1.8.1 +torch cuda version ...............torch cuda version 11.1............... + nvcc version11.1 +.....................nvcc version 11.2..................... + deepspeed install path11.2 +...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1 +torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +async_io ............... [NO] ....... [NO] +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +async_io ............... [NO] ....... [NO]transformer_inference + .. [NO] ....... [OKAY] +transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] + +DeepSpeed general environment info: +quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] +[OKAY] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +-------------------------------------------------- +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +DeepSpeed general environment info: +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +DeepSpeed general environment info: +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch versionDeepSpeed general environment info: .................... 1.8.1 + +torch cuda version ............... torch install path11.1 +nvcc version .................................... 11.2 +deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']torch version + deepspeed info.................... ................... 1.8.10.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.torch cuda version ..................... torch 1.8, cuda 11.111.1 + +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +torch version .................... 1.8.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +torch cuda version ............... 11.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +torch cuda version ............... 11.1 +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version torch version.................... ....................1.8.1 +1.8.1 +torch cuda version ...............torch cuda version 11.1............... + nvcc version11.1 +.....................nvcc version 11.2..................... + deepspeed install path11.2 +...........deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed info + deepspeed info................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +DeepSpeed general environment info:DeepSpeed general environment info: + +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +.................... 1.8.1 +torch versiontorch cuda version ................................... 11.1 +1.8.1nvcc version + ..................... torch cuda version11.2 +deepspeed install path............... ........... 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']nvcc version + deepspeed info ........................................ 0.5.5+cd7967d, cd7967d, master11.2 + +deepspeed wheel compiled w. deepspeed install path...... torch 1.8, cuda 11.1........... + ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2DeepSpeed general environment info: +deepspeed install path +........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +torch install pathdeepspeed info .................................. 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']...... + torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +nvcc version ..................... 11.2 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +nvcc version ..................... 11.2 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +DeepSpeed general environment info: +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch cuda version ............... 11.1 +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc version ..................... 11.2 +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']torch version + .................... 1.8.1torch version + .................... torch cuda version1.8.1 +............... 11.1torch cuda version + nvcc version............... .....................11.1 +11.2nvcc version + deepspeed install path..................... ...........11.2 +deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']........... + deepspeed info ...................['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +0.5.5+cd7967d, cd7967d, master +deepspeed info deepspeed wheel compiled w.................... ......0.5.5+cd7967d, cd7967d, master +torch 1.8, cuda 11.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch versionDeepSpeed general environment info: .................... 1.8.1 + +torch cuda version ...............torch install path 11.1 +............... nvcc version ..................... 11.2 +deepspeed install path['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +........... torch version ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'].................... + deepspeed info1.8.1 +................... 0.5.5+cd7967d, cd7967d, mastertorch cuda version + deepspeed wheel compiled w................ ......11.1 +torch 1.8, cuda 11.1nvcc version + ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1DeepSpeed general environment info: +torch cuda version + ............... 11.1 +nvcc versiontorch install path .................................... 11.2 +deepspeed install path ........... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed infotorch version ....................................... 0.5.5+cd7967d, cd7967d, master1.8.1 + +deepspeed wheel compiled w. torch cuda version...... ...............torch 1.8, cuda 11.1 +11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... DeepSpeed general environment info: +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install pathtorch version .................... ...............1.8.1 +torch cuda version ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']11.1 + +nvcc version torch version..................... 11.2.................... + deepspeed install path1.8.1 +........... torch cuda version ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +11.1deepspeed info + nvcc version................... .....................0.5.5+cd7967d, cd7967d, master +11.2deepspeed wheel compiled w. + deepspeed install path...... ...........torch 1.8, cuda 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... DeepSpeed general environment info:11.2 +deepspeed install path + ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']torch install path +deepspeed info .................................. 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ......['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 +using torch.float16 for parameters ... +------------------------ arguments ------------------------ + accumulate_allreduce_grads_in_fp32 .............. False + adam_beta1 ...................................... 0.9 + adam_beta2 ...................................... 0.95 + adam_eps ........................................ 1e-08 + adlr_autoresume ................................. False + adlr_autoresume_interval ........................ 1000 + apply_query_key_layer_scaling ................... True + apply_residual_connection_post_layernorm ........ False + attention_dropout ............................... 0.1 + attention_softmax_in_fp32 ....................... False + bert_binary_head ................................ True + bert_load ....................................... None + bf16 ............................................ False + bias_dropout_fusion ............................. True + bias_gelu_fusion ................................ True + biencoder_projection_dim ........................ 0 + biencoder_shared_query_context_model ............ False + block_data_path ................................. None + checkpoint_activations .......................... True + checkpoint_in_cpu ............................... False + checkpoint_num_layers ........................... 1 + clip_grad ....................................... 1.0 + codecarbon_dir .................................. None + consumed_train_samples .......................... 0 + consumed_train_tokens ........................... 0 + consumed_valid_samples .......................... 0 + contigious_checkpointing ........................ False + cpu_optimizer ................................... False + cpu_torch_adam .................................. False + curriculum_learning ............................. False + data_impl ....................................... mmap + data_parallel_size .............................. 1 + data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] + dataloader_type ................................. single + DDP_impl ........................................ local + decoder_seq_length .............................. None + deepscale ....................................... False + deepscale_config ................................ None + deepspeed ....................................... True + deepspeed_activation_checkpointing .............. True + deepspeed_config ................................ ./ds_config.1587010.json + deepspeed_mpi ................................... False + distribute_checkpointed_activations ............. False + distributed_backend ............................. nccl + embedding_path .................................. None + encoder_seq_length .............................. 2048 + eod_mask_loss ................................... False + eval_interval ................................... 1000 + eval_iters ...................................... 5 + evidence_data_path .............................. None + exit_duration_in_mins ........................... 55 + exit_interval ................................... None + ffn_hidden_size ................................. 46400 + finetune ........................................ False + fp16 ............................................ True + fp16_lm_cross_entropy ........................... False + fp32_residual_connection ........................ False + gigaflos_no_embeds .............................. 0 + global_batch_size ............................... 2048 + glu_activation .................................. None + hidden_dropout .................................. 0.1 + hidden_size ..................................... 11600 + hysteresis ...................................... 2 + ict_head_size ................................... None + ict_load ........................................ None + img_dim ......................................... 224 + indexer_batch_size .............................. 128 + indexer_log_interval ............................ 1000 + init_method_std ................................. 0.02 + init_method_xavier_uniform ...................... False + initial_loss_scale .............................. 4294967296 + kv_channels ..................................... 145 + layernorm_epsilon ............................... 1e-05 + lazy_mpu_init ................................... None + load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + local_rank ...................................... 0 + log_batch_size_to_tensorboard ................... True + log_interval .................................... 1 + log_learning_rate_to_tensorboard ................ True + log_loss_scale_to_tensorboard ................... True + log_num_zeros_in_grad ........................... False + log_params_norm ................................. False + log_timers_to_tensorboard ....................... True + log_validation_ppl_to_tensorboard ............... True + loss_on_targets_only ............................ False + loss_scale ...................................... 12.0 + loss_scale_window ............................... 1000 + lr .............................................. 6e-05 + lr_decay_iters .................................. None + lr_decay_samples ................................ None + lr_decay_style .................................. cosine + lr_decay_tokens ................................. 260000000000 + lr_warmup_fraction .............................. None + lr_warmup_iters ................................. 0 + lr_warmup_samples ............................... 216320 + make_vocab_size_divisible_by .................... 128 + mask_prob ....................................... 0.15 + masked_softmax_fusion ........................... False + max_position_embeddings ......................... 2048 + memory_centric_tiled_linear ..................... False + merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt + micro_batch_size ................................ 1 + min_loss_scale .................................. 1.0 + min_lr .......................................... 6e-06 + mmap_warmup ..................................... False + no_load_optim ................................... None + no_load_rng ..................................... None + no_save_optim ................................... None + no_save_rng ..................................... None + num_attention_heads ............................. 80 + num_channels .................................... 3 + num_classes ..................................... 1000 + num_layers ...................................... 64 + num_layers_per_virtual_pipeline_stage ........... None + num_workers ..................................... 2 + onnx_safe ....................................... None + openai_gelu ..................................... False + optimizer ....................................... adam + override_lr_scheduler ........................... False + params_dtype .................................... torch.float16 + partition_activations ........................... False + patch_dim ....................................... 16 + pipeline_model_parallel_size .................... 32 + position_embedding_type ......................... PositionEmbeddingType.absolute + profile_backward ................................ False + query_in_block_prob ............................. 0.1 + rampup_batch_size ............................... None + rank ............................................ 0 + remote_device ................................... none + reset_attention_mask ............................ False + reset_position_ids .............................. False + retriever_report_topk_accuracies ................ [] + retriever_score_scaling ......................... False + retriever_seq_length ............................ 256 + sample_rate ..................................... 1.0 + save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + save_interval ................................... 300 + scatter_gather_tensors_in_pipeline .............. True + scattered_embeddings ............................ False + seed ............................................ 43 + seq_length ...................................... 2048 + sgd_momentum .................................... 0.9 + short_seq_prob .................................. 0.1 + split ........................................... 949,50,1 + split_transformers .............................. False + synchronize_each_layer .......................... False + tensor_model_parallel_size ...................... 4 + tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard + tensorboard_log_interval ........................ 1 + tensorboard_queue_size .......................... 5 + tile_factor ..................................... 1 + titles_data_path ................................ None + tokenizer_name_or_path .......................... None + tokenizer_type .................................. GPT2BPETokenizer + train_iters ..................................... None + train_samples ................................... 600000000 + train_tokens .................................... 300000000000 + use_checkpoint_lr_scheduler ..................... False + use_contiguous_buffers_in_ddp ................... False + use_cpu_initialization .......................... None + use_one_sent_docs ............................... False + use_pin_memory .................................. False + virtual_pipeline_model_parallel_size ............ None + vocab_extra_ids ................................. 0 + vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json + weight_decay .................................... 0.1 + world_size ...................................... 128 + zero_allgather_bucket_size ...................... 0.0 + zero_contigious_gradients ....................... False + zero_reduce_bucket_size ......................... 0.0 + zero_reduce_scatter ............................. False + zero_stage ...................................... 1 +-------------------- end of arguments --------------------- +setting number of micro-batches to constant 2048 +> building GPT2BPETokenizer tokenizer ... +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) +> initializing torch distributed ... +> setting tensorboard ... +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- + +op nameop nameop name op name ................ ................ ................ ................installed installed installedinstalled.... ..compatiblecompatible.. + + compatible--------------------------------------------------compatible-------------------------------------------------- + + + +-------------------------------------------------- +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam ............... cpu_adam[YES]............... ............... ............... ......[YES] [YES] [OKAY][YES] + ...... ............[OKAY] [OKAY] +[OKAY] + +fused_adam ............. [NO] ....... [OKAY] +fused_adamfused_adamfused_adam .............fused_lamb............. .............[NO] ....................[NO][NO] [NO] [OKAY]..................... + [OKAY] [OKAY] +[OKAY] +fused_lamb + .............fused_lamb fused_lamb[NO] ................................. sparse_attn [NO][OKAY] [NO] + .......................... [NO][OKAY][OKAY] + +....... [OKAY] +transformersparse_attn ........................ [NO][NO] .............. sparse_attnsparse_attn [OKAY] [OKAY] +........................ + [NO][NO]stochastic_transformertransformer ........................... [OKAY][NO][OKAY][NO] + + .......transformer .......[OKAY] +[OKAY]transformer............ + ............[NO]stochastic_transformer [NO]........ .......[OKAY][NO] + [OKAY] +....... stochastic_transformer[OKAY] stochastic_transformer + . .[NO] [NO] .............. [OKAY][OKAY] + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io + ............... [NO] ....... [NO] +async_io ............... [NO] transformer_inference....... ..[NO] +[NO] ....... [OKAY] +utils .................. [YES]transformer_inference ........ [OKAY][NO] + ....... [OKAY] +quantizer .............. [NO] utils....... ..................[OKAY] +[YES] ...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +DeepSpeed general environment info:torch version +.................... 1.8.1 +torch cuda version torch install path............... 11.1............... + nvcc version ..................... 11.2 +deepspeed install path['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +........... torch version['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +....................deepspeed info 1.8.1 +................... 0.5.5+cd7967d, cd7967d, mastertorch cuda version + ...............deepspeed wheel compiled w. 11.1...... + torch 1.8, cuda 11.1nvcc version + ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +> initializing tensor model parallel with size 4 +> initializing pipeline model parallel with size 32 +> setting random seeds to 43 ... +[2021-10-18 04:28:14,983] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 +> compiling dataset index builder ... +make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/data' +g++ -O3 -Wall -shared -std=c++11 -fPIC -fdiagnostics-color -I/gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -I/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/pybind11/include helpers.cpp -o helpers.cpython-38-x86_64-linux-gnu.so +make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/data' +>>> done with dataset index builder. Compilation time: 6.291 seconds +WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. +> compiling and loading fused kernels ... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/fused_kernels/build/build.ninja... +Building extension module fused_mix_prec_layer_norm_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +[1/3] c++ -MMD -MF layer_norm_cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/fused_kernels/layer_norm_cuda.cpp -o layer_norm_cuda.o +[2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -maxrregcount=50 -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o +[3/3] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so +Loading extension module fused_mix_prec_layer_norm_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +>>> done with compiling and loading fused kernels. Compilation time: 25.466 seconds +time to initialize megatron (seconds): 94.777 +[after megatron is initialized] datetime: 2021-10-18 04:28:46 +building GPT model ... +[2021-10-18 04:28:46,846] [INFO] [utils.py:806:see_memory_usage] Before Building Model +[2021-10-18 04:28:46,847] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +[2021-10-18 04:28:46,847] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.54 GB, percent = 21.1% +SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} +[2021-10-18 04:28:48,522] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer +stage=0 layers=5 + 0: _to_float16 + 1: EmbeddingPipe + 2: + 3: ParallelTransformerLayerPipe + 4: ParallelTransformerLayerPipe +stage=1 layers=2 + 5: ParallelTransformerLayerPipe + 6: ParallelTransformerLayerPipe +stage=2 layers=2 + 7: ParallelTransformerLayerPipe + 8: ParallelTransformerLayerPipe +stage=3 layers=2 + 9: ParallelTransformerLayerPipe + 10: ParallelTransformerLayerPipe +stage=4 layers=2 + 11: ParallelTransformerLayerPipe + 12: ParallelTransformerLayerPipe +stage=5 layers=2 + 13: ParallelTransformerLayerPipe + 14: ParallelTransformerLayerPipe +stage=6 layers=2 + 15: ParallelTransformerLayerPipe + 16: ParallelTransformerLayerPipe +stage=7 layers=2 + 17: ParallelTransformerLayerPipe + 18: ParallelTransformerLayerPipe +stage=8 layers=2 + 19: ParallelTransformerLayerPipe + 20: ParallelTransformerLayerPipe +stage=9 layers=2 + 21: ParallelTransformerLayerPipe + 22: ParallelTransformerLayerPipe +stage=10 layers=2 + 23: ParallelTransformerLayerPipe + 24: ParallelTransformerLayerPipe +stage=11 layers=2 + 25: ParallelTransformerLayerPipe + 26: ParallelTransformerLayerPipe +stage=12 layers=2 + 27: ParallelTransformerLayerPipe + 28: ParallelTransformerLayerPipe +stage=13 layers=2 + 29: ParallelTransformerLayerPipe + 30: ParallelTransformerLayerPipe +stage=14 layers=2 + 31: ParallelTransformerLayerPipe + 32: ParallelTransformerLayerPipe +stage=15 layers=2 + 33: ParallelTransformerLayerPipe + 34: ParallelTransformerLayerPipe +stage=16 layers=2 + 35: ParallelTransformerLayerPipe + 36: ParallelTransformerLayerPipe +stage=17 layers=2 + 37: ParallelTransformerLayerPipe + 38: ParallelTransformerLayerPipe +stage=18 layers=2 + 39: ParallelTransformerLayerPipe + 40: ParallelTransformerLayerPipe +stage=19 layers=2 + 41: ParallelTransformerLayerPipe + 42: ParallelTransformerLayerPipe +stage=20 layers=2 + 43: ParallelTransformerLayerPipe + 44: ParallelTransformerLayerPipe +stage=21 layers=2 + 45: ParallelTransformerLayerPipe + 46: ParallelTransformerLayerPipe +stage=22 layers=2 + 47: ParallelTransformerLayerPipe + 48: ParallelTransformerLayerPipe +stage=23 layers=2 + 49: ParallelTransformerLayerPipe + 50: ParallelTransformerLayerPipe +stage=24 layers=2 + 51: ParallelTransformerLayerPipe + 52: ParallelTransformerLayerPipe +stage=25 layers=2 + 53: ParallelTransformerLayerPipe + 54: ParallelTransformerLayerPipe +stage=26 layers=2 + 55: ParallelTransformerLayerPipe + 56: ParallelTransformerLayerPipe +stage=27 layers=2 + 57: ParallelTransformerLayerPipe + 58: ParallelTransformerLayerPipe +stage=28 layers=2 + 59: ParallelTransformerLayerPipe + 60: ParallelTransformerLayerPipe +stage=29 layers=2 + 61: ParallelTransformerLayerPipe + 62: ParallelTransformerLayerPipe +stage=30 layers=2 + 63: ParallelTransformerLayerPipe + 64: ParallelTransformerLayerPipe +stage=31 layers=6 + 65: ParallelTransformerLayerPipe + 66: ParallelTransformerLayerPipe + 67: + 68: MixedFusedLayerNorm + 69: EmbeddingPipe + 70: float16_to_fp32 + loss: CrossEntropy + > number of parameters on (tensor, pipeline) model parallel rank (2, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (1, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 29): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 29): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (1, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 30): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 30): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 25): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 25): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 20): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 20): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 24): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 24): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 26): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 26): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (1, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 978291800 +[2021-10-18 04:28:49,231] [INFO] [utils.py:806:see_memory_usage] After Building Model +[2021-10-18 04:28:49,231] [INFO] [utils.py:807:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB +[2021-10-18 04:28:49,232] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.71 GB, percent = 21.2% + > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 978291800 +setting training iterations to 292968 +> learning rate decay style: cosine +DeepSpeed is enabled. +[2021-10-18 04:28:49,232] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+cd7967d, git-hash=cd7967d, git-branch=master +[2021-10-18 04:28:49,271] [INFO] [engine.py:204:__init__] DeepSpeed Flops Profiler Enabled: False +[2021-10-18 04:28:49,271] [INFO] [engine.py:848:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer +[2021-10-18 04:28:49,271] [INFO] [engine.py:854:_configure_optimizer] Using client Optimizer as basic optimizer +[2021-10-18 04:28:49,272] [INFO] [engine.py:870:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam +[2021-10-18 04:28:49,272] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= +[2021-10-18 04:28:49,272] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer +[2021-10-18 04:28:49,272] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 +[2021-10-18 04:28:49,272] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 +[2021-10-18 04:28:49,272] [INFO] [stage2.py:113:__init__] CPU Offload: False +[2021-10-18 04:28:49,272] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False +Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 1 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 0 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 124 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 125 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 2 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 3 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 127 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 126 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +[2021-10-18 04:28:51,104] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states +[2021-10-18 04:28:51,105] [INFO] [utils.py:807:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB +[2021-10-18 04:28:51,105] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.74 GB, percent = 21.2% +[2021-10-18 04:28:51,159] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states +[2021-10-18 04:28:51,159] [INFO] [utils.py:807:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB +[2021-10-18 04:28:51,160] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.74 GB, percent = 21.2% +[2021-10-18 04:28:51,160] [INFO] [stage2.py:474:__init__] optimizer state initialized +[2021-10-18 04:28:51,189] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer +[2021-10-18 04:28:51,189] [INFO] [utils.py:807:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB +[2021-10-18 04:28:51,190] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.74 GB, percent = 21.2% +[2021-10-18 04:28:51,190] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +[2021-10-18 04:28:51,190] [INFO] [engine.py:596:_configure_lr_scheduler] DeepSpeed using client LR scheduler +[2021-10-18 04:28:51,190] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +[2021-10-18 04:28:51,190] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] +[2021-10-18 04:28:51,190] [INFO] [config.py:940:print] DeepSpeedEngine configuration: +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] amp_enabled .................. False +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] amp_params ................... False +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] curriculum_enabled ........... True +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} +[2021-10-18 04:28:51,190] [INFO] [config.py:944:print] dataloader_drop_last ......... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] disable_allgather ............ False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] dump_state ................... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_enabled ........... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] eigenvalue_verbose ........... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] elasticity_enabled ........... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] flops_profiler_config ........ { + "enabled": false, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] fp16_enabled ................. True +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] global_rank .................. 0 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] gradient_accumulation_steps .. 2048 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] loss_scale ................... 0 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] memory_breakdown ............. False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] optimizer_name ............... None +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] optimizer_params ............. None +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] pld_enabled .................. False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] pld_params ................... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] prescale_gradients ........... False +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] quantize_groups .............. 1 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] quantize_offset .............. 1000 +[2021-10-18 04:28:51,191] [INFO] [config.py:944:print] quantize_period .............. 1000 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] quantize_rounding ............ 0 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] quantize_start_bits .......... 16 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] quantize_target_bits ......... 8 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] quantize_training_enabled .... False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] quantize_type ................ 0 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] quantize_verbose ............. False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] scheduler_name ............... None +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] scheduler_params ............. None +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] sparse_attention ............. None +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] steps_per_print .............. 2000 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] tensorboard_enabled .......... False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] tensorboard_output_path ...... +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] train_batch_size ............. 2048 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] use_quantizer_kernel ......... False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] wall_clock_breakdown ......... False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] world_size ................... 1 +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] zero_allow_untested_optimizer False +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] zero_config .................. { + "stage": 1, + "contiguous_gradients": true, + "reduce_scatter": true, + "reduce_bucket_size": 5.000000e+08, + "allgather_partitions": true, + "allgather_bucket_size": 5.000000e+08, + "overlap_comm": false, + "load_from_fp32_weights": true, + "elastic_checkpoint": true, + "offload_param": null, + "offload_optimizer": null, + "sub_group_size": 1.000000e+09, + "prefetch_bucket_size": 5.000000e+07, + "param_persistence_threshold": 1.000000e+05, + "max_live_parameters": 1.000000e+09, + "max_reuse_distance": 1.000000e+09, + "gather_fp16_weights_on_model_save": false, + "ignore_unused_parameters": true, + "round_robin_gradients": false, + "legacy_stage1": false +} +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] zero_enabled ................. True +[2021-10-18 04:28:51,192] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 +[2021-10-18 04:28:51,192] [INFO] [config.py:946:print] json = { + "train_micro_batch_size_per_gpu": 1, + "train_batch_size": 2.048000e+03, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": 1 + }, + "fp16": { + "enabled": true, + "loss_scale": 0, + "loss_scale_window": 500, + "hysteresis": 2, + "min_loss_scale": 1, + "initial_scale_power": 12 + }, + "curriculum_learning": { + "enabled": true, + "curriculum_type": "seqlen", + "min_difficulty": 64, + "max_difficulty": 2.048000e+03, + "schedule_type": "fixed_linear", + "schedule_config": { + "total_curriculum_step": 3.600000e+04, + "difficulty_step": 8 + } + }, + "steps_per_print": 2.000000e+03, + "wall_clock_breakdown": false +} +[2021-10-18 04:28:51,193] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,583] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:28:51,672] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,672] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,672] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,672] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + will not load any checkpoints and will start from random +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,673] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,674] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,675] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,676] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,676] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,676] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,676] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,677] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,677] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,677] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:28:51,677] [WARNING] [engine.py:1981:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +time (ms) | load-checkpoint: 5.46 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 125.2213504 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.2213504estimated model parameters: 125.2213504 + +estimated model parameters: 125.2213504 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 125.22432 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.368064 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-18 04:28:51 +> building train, validation, and test datasets ... + > datasets target sizes (minimum size): + train: 600000000 + validation: 3000320 + test: 10240 +> building train, validation, and test datasets for GPT ... + > building dataset index ... + reading sizes... + reading pointers... + reading document index... + creating numpy buffer of mmap... + creating memory view of numpy buffer... + > finished creating indexed dataset in 0.127187 seconds + number of documents: 304230423 + > dataset split: + train: + document indices in [0, 288714672) total of 288714672 documents + validation: + document indices in [288714672, 303926193) total of 15211521 documents + test: + document indices in [303926193, 304230423) total of 304230 documents + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.262 seconds + total number of samples: 657686117 + total number of epochs: 5 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.156 seconds + total number of samples: 6927161 + total number of epochs: 1 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.056 seconds + total number of samples: 137384 + total number of epochs: 1 +> finished creating GPT datasets ... +[after dataloaders are built] datetime: 2021-10-18 04:28:57 +done with setup ... +training ... +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 125.2213504 billionNumber of parameters: 125.2213504 billion + +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +time (ms) | model-and-optimizer-setup: 4896.38 | train/valid/test-data-iterators-setup: 5425.07 +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters: 125.22432 billionNumber of parameters: 125.22432 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billionNumber of parameters without embeddings: 103.368064 billion + +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +[before the start of training step] datetime: 2021-10-18 04:28:57 +[2021-10-18 04:28:57,799] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information +[2021-10-18 04:28:57,799] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False +[2021-10-18 04:28:57,799] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers +[2021-10-18 04:28:57,799] [INFO] [checkpointing.py:554:forward] ----Synchronization False +[2021-10-18 04:28:57,799] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False +slurmstepd: error: *** STEP 1587010.0 ON r6i4n4 CANCELLED AT 2021-10-18T04:42:03 *** +Killing subprocess 2635825 +Killing subprocess 2063452 +Killing subprocess 756094 +Killing subprocess 3068823 +Killing subprocess 2635826 +Killing subprocess 756095 +Killing subprocess 2063453 +srun: Job step aborted: Waiting up to 62 seconds for job step to finish. +Killing subprocess 2635827 +Killing subprocess 715627 +Killing subprocess 756096 +Killing subprocess 2635829 +Killing subprocess 3068824 +Killing subprocess 756097 +Main process received SIGTERM, exiting +Killing subprocess 715628 +Killing subprocess 715629 +Killing subprocess 3068825 +Killing subprocess 715630 +Killing subprocess 1542999 +Killing subprocess 3068826 +Killing subprocess 2859693 +Main process received SIGTERM, exiting +Killing subprocess 2859694 +Main process received SIGTERM, exiting +Killing subprocess 1543000 +Killing subprocess 2063454 +Killing subprocess 2063455 +Main process received SIGTERM, exiting +Killing subprocess 2859695 +Main process received SIGTERM, exiting +Killing subprocess 2859697 +Killing subprocess 1543001 +Killing subprocess 1543003 +Killing subprocess 1539704 +Main process received SIGTERM, exiting +Killing subprocess 1539705 +Killing subprocess 1542563 +Main process received SIGTERM, exiting +Killing subprocess 1539706 +Killing subprocess 1542564 +Killing subprocess 3179848 +Killing subprocess 1539707 +Killing subprocess 1542565 +Main process received SIGTERM, exiting +Killing subprocess 3179849 +Killing subprocess 1555847 +Killing subprocess 2864365 +Killing subprocess 3179850 +Killing subprocess 1545594 +Killing subprocess 2864366 +Killing subprocess 1555848 +Killing subprocess 1542567 +Killing subprocess 3179851 +Main process received SIGTERM, exiting +Killing subprocess 1545595 +Killing subprocess 1555849 +Killing subprocess 395964 +Killing subprocess 2864367 +Killing subprocess 2864369 +Killing subprocess 1555850 +Killing subprocess 1550099 +Main process received SIGTERM, exiting +Killing subprocess 1545596 +Killing subprocess 3395959 +Killing subprocess 1542944 +Killing subprocess 4108363 +Killing subprocess 1543631 +Killing subprocess 395965 +Killing subprocess 1287928 +Killing subprocess 393464 +Killing subprocess 1550100 +Killing subprocess 376841 +Killing subprocess 19464 +Killing subprocess 626626 +Killing subprocess 4108364 +Main process received SIGTERM, exiting +Killing subprocess 3395960 +Killing subprocess 567183 +Killing subprocess 1543632 +Killing subprocess 1816835 +Killing subprocess 1542945 +Killing subprocess 393465 +Killing subprocess 4002214 +Killing subprocess 395966 +Killing subprocess 1649390 +Killing subprocess 481710 +Killing subprocess 376842 +Killing subprocess 1550101 +Killing subprocess 1542946 +Killing subprocess 4108365 +Killing subprocess 1287929 +Killing subprocess 1543633 +Main process received SIGTERM, exiting +Killing subprocess 1816836 +Killing subprocess 626627 +Killing subprocess 1934174 +Killing subprocess 1287930 +Killing subprocess 3395961 +Killing subprocess 4002215 +Killing subprocess 1649391 +Killing subprocess 19465 +Killing subprocess 3395962 +Killing subprocess 567184 +Killing subprocess 1543634 +Killing subprocess 4108367 +Killing subprocess 393466 +Killing subprocess 1649392 +Killing subprocess 626628 +Main process received SIGTERM, exiting +Killing subprocess 393467 +Killing subprocess 1545597 +Killing subprocess 4002216 +Killing subprocess 481711 +Main process received SIGTERM, exiting +Killing subprocess 567185 +Killing subprocess 355930 +Killing subprocess 1542947 +Killing subprocess 376843 +Main process received SIGTERM, exiting +Killing subprocess 626630 +Killing subprocess 1287932 +Killing subprocess 4002218 +Killing subprocess 1816837 +Killing subprocess 376844 +Killing subprocess 1934175 +Killing subprocess 1816839 +Killing subprocess 19466 +Killing subprocess 1649393 +Killing subprocess 19467 +Main process received SIGTERM, exiting +Killing subprocess 355931 +Killing subprocess 395967 +Main process received SIGTERM, exiting +Killing subprocess 1550102 +Killing subprocess 355932 +Main process received SIGTERM, exiting +Killing subprocess 481712 +Killing subprocess 2202614 +Killing subprocess 1934176 +Killing subprocess 481713 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 355934 +Killing subprocess 1934177 +Killing subprocess 567186 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 2202615 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 2202616 +Killing subprocess 2202617 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +---------------------------------------------------------------------------------------------------- + +--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninja .................................... [OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +op name op name................ ................installedninja ..installed.................. compatible..[OKAY] + +--------------------------------------------------compatible +-------------------------------------------------- + +-------------------------------------------------- +op name ................ installedcpu_adam ................. compatiblecpu_adam +[YES] -------------------------------------------------- +..................... [YES][OKAY] + cpu_adam...... ............... [OKAY][YES] + ...... fused_adam[OKAY] +............. [NO] ....... [OKAY]fused_adam + .............fused_adam fused_lamb [NO].......................... .......[NO][NO] .......[OKAY]....... + [OKAY][OKAY] + +fused_lamb fused_lamb............. ............. [NO][NO] .............. [OKAY][OKAY]sparse_attn + +............ [NO] ....... [OKAY] +transformer ............ sparse_attnsparse_attn[NO] ............................... [NO][OKAY] +[NO]....... stochastic_transformer[OKAY]....... + [OKAY]. + transformer[NO] transformer................... ............[NO] [OKAY] ....... +[NO] [OKAY]....... + [OKAY] +stochastic_transformer stochastic_transformer. [NO]. ....... [NO][OKAY] +....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer --------------------------------------------------. [NO] +.......DeepSpeed C++/CUDA extension op report +[OKAY]-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +ninja .................. [OKAY] +cpu_adam --------------------------------------------------............... + [YES]op name ...................... [OKAY]installed + .. compatible +-------------------------------------------------- +fused_adam .............cpu_adam [NO]............... .......[YES] ......[OKAY] +[OKAY] +--------------------------------------------------fused_lamb ............. + DeepSpeed C++/CUDA extension op report[NO] + --------------------------------------------------fused_adam....... + .............NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.[OKAY] +[NO] +-------------------------------------------------- +.......JIT compiled ops requires ninja +[OKAY] +fused_lamb ............. sparse_attn[NO] ................... [OKAY][NO] + ....... [OKAY] +transformer ............ [NO] ....... [OKAY]sparse_attn + ............ [NO] .......stochastic_transformer [OKAY] +. [NO]transformer ................... [NO][OKAY] +....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja + + +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ...............ninja [YES] ........................ [OKAY][OKAY] + +-------------------------------------------------- +op name ................ installedninja .. fused_adam..................compatible +.............[OKAY] -------------------------------------------------- + +[NO] --------------------------------------------------....... + [OKAY]op name + ................ cpu_adamfused_lambinstalled .............................. [YES][NO]compatible +............. -------------------------------------------------- [OKAY] +[OKAY] + +cpu_adam ............... fused_adam[YES] .............sparse_attn...... [NO]............[OKAY] +.......[NO] [OKAY]....... + [OKAY] +fused_lamb .............transformer [NO]............ fused_adam.......[NO] .............[OKAY]....... + [NO][OKAY] +....... [OKAY] +stochastic_transformer .fused_lamb [NO]............. .......sparse_attn[NO] [OKAY]................... + [NO][OKAY] +....... [OKAY] +transformer ............ [NO] ....... [OKAY] +sparse_attnstochastic_transformer ............ [NO]. .......[NO] [OKAY]....... + [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ninja............... [YES].................. ......[OKAY] [OKAY] + +-------------------------------------------------- +op name ................ installed .. compatible +--------------------------------------------------fused_adam + ............. [NO] ....... [OKAY] +cpu_adam ............... fused_lamb[YES] ................... [NO][OKAY] +....... [OKAY] +fused_adam ............. [NO] .......sparse_attn [OKAY] +............ [NO] .......fused_lamb [OKAY]............. + [NO] ....... transformer[OKAY] +............ [NO] ....... [OKAY] +stochastic_transformer .sparse_attn [NO]............ .......[NO] [OKAY]....... + [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +ninja .................. [OKAY] +sparse_attn ............-------------------------------------------------- +[NO] op name....... ................[OKAY] +transformer ............ installed[NO] ....... [OKAY] + stochastic_transformer.. compatible. + --------------------------------------------------[NO] ....... [OKAY] + +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer ninja. [NO].................. [OKAY] + .......-------------------------------------------------- +[OKAY] +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer ninja. ..................[NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] + +fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] + +fused_lambfused_lamb ............. .............[NO] [NO]....... .......[OKAY] +[OKAY] +sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] + +transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . ninja[NO] ....... ..................[OKAY] +[OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. ninja[NO] ......................... [OKAY][OKAY] + +-------------------------------------------------- +op namefused_lamb ............................. installed[NO] ......... compatible[OKAY] + +-------------------------------------------------- +cpu_adam ............... sparse_attn[YES] .................. [NO][OKAY] +....... [OKAY] +transformer ............ [NO] fused_adam....... .............[OKAY] +[NO] ....... [OKAY]stochastic_transformer + . [NO]fused_lamb .................... [OKAY][NO] + ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adamninja ............. ..................[NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +op namefused_lamb ............................. [NO]installed ......... [OKAY]compatible + +-------------------------------------------------- +cpu_adam ...............sparse_attn [YES]............ ......[NO] [OKAY]....... + [OKAY] +transformer ............ [NO] ....... [OKAY]fused_adam + ............. [NO] stochastic_transformer....... [OKAY] +. [NO] fused_lamb....... .............[OKAY] +[NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam .............................. [YES][YES] ...... ......[OKAY] +[OKAY] +fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] + +fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] + +sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] + +transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installedninja .. ..................compatible +[OKAY]-------------------------------------------------- + +-------------------------------------------------- +op name ................ installed cpu_adam.. ...............compatible +[YES] --------------------------------------------------...... + [OKAY] +cpu_adam ............... [YES] ...... fused_adam[OKAY] +............. [NO] ....... [OKAY] +fused_lamb .............fused_adam [NO]............. [NO]....... .......[OKAY] +[OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +transformer sparse_attn............ ............[NO] .......[NO] [OKAY]....... +ninja .................. [OKAY] +-------------------------------------------------- +JIT compiled ops requires ninja + [OKAY] +op name ................ installed .. compatible +stochastic_transformer transformer ............. [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +stochastic_transformer . [NO] ....... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +-------------------------------------------------- +fused_adam ............. [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +fused_lamb ............. [NO] ....... [OKAY] +cpu_adam ............... [YES]ninja ........................ [OKAY][OKAY] + +-------------------------------------------------- +op name ................ installed .. fused_adamcompatible + .............-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +[NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +cpu_adamfused_lamb ............................ [YES] [NO]...... .......[OKAY] + [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +fused_adam ............. [NO]sparse_attn ................... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +[NO] ....... [OKAY] +-------------------------------------------------- +JIT compiled ops requires ninja +fused_lamb .............transformer [NO]............ ....... [NO][OKAY] + ....... [OKAY] +stochastic_transformer . [NO] .......sparse_attn [OKAY]............ + [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +ninja .................. [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +-------------------------------------------------- +fused_adam ............. [NO] ....... [OKAY] +op name ................ installed .. compatible +fused_lamb ............. [NO] ....... [OKAY] +-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +transformer ............ [NO] ....... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +ninja .................. [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +ninja .................. [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +sparse_attn ............ [NO] ....... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +transformer ............ [NO] ....... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninja .................................... [OKAY][OKAY] + +-------------------------------------------------- +-------------------------------------------------- +op name op name................ ................installed installed.. ..compatible +compatible +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] + +fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] + +fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... + [OKAY] +sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] +....... [OKAY] +transformer ............transformer [NO]............ .......[NO] [OKAY]....... + [OKAY] +stochastic_transformer stochastic_transformer. [NO]. .......[NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +ninja .................. fused_adam[OKAY] +............. --------------------------------------------------[NO] + .......op name [OKAY]................ + installed .. fused_lambcompatible +.............-------------------------------------------------- +[NO] ....... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformerfused_adam ......................... [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformer fused_lamb .............. [NO][NO] .............. [OKAY][OKAY] + +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatible +-------------------------------------------------- +compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] fused_adam....... .............[OKAY] +[NO] ....... [OKAY] +sparse_attn fused_lamb............ .............[NO] [NO]....... .......[OKAY] +[OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer sparse_attn. ............[NO] ....... [OKAY][NO] + ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attnninja .............................. [NO][OKAY] +....... --------------------------------------------------[OKAY] + +op nametransformer ................ ............installed [NO].. .......compatible +[OKAY]-------------------------------------------------- + +stochastic_transformer . cpu_adam[NO] ...................... [YES] [OKAY]...... + [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportfused_adam + .............-------------------------------------------------- [NO] + NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op........ + [OKAY]-------------------------------------------------- + +JIT compiled ops requires ninja +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +ninja .................. [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam cpu_adam............... [YES]............... ......[YES] [OKAY]...... + [OKAY] +fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] + +fused_lamb fused_lamb............. .............[NO] [NO]....... .......[OKAY] +[OKAY] +sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... + [OKAY] +transformer transformer............ ............[NO] [NO]....... .......[OKAY] +[OKAY] +stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninja .................. ..................[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] +[OKAY] +fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] + +fused_lamb .............fused_lamb [NO]............. .......[NO] [OKAY]....... + [OKAY] +sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] +[OKAY] +transformer transformer............ ............[NO] [NO]....... .......[OKAY] +[OKAY] +stochastic_transformer stochastic_transformer . .[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report +--------------------------------------------------DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja +-------------------------------------------------- + +-------------------------------------------------- +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +ninja .................. [OKAY] +cpu_adam ...............-------------------------------------------------- +[YES]ninja ......op name..................ninja [OKAY][OKAY].................. +................ + --------------------------------------------------[OKAY]installed + + op name..-------------------------------------------------- +................compatibleop name + installed................ -------------------------------------------------- ..installed + fused_adamcompatible.. +............. -------------------------------------------------- +compatiblecpu_adam + ...............--------------------------------------------------[NO] [YES] + cpu_adam............. ............... [OKAY] [OKAY] +[YES] +cpu_adam ......fused_lamb ............... [OKAY] ............. + fused_adam[YES][NO] ................... [OKAY][NO] + ....... .......fused_adam[OKAY] +............. [OKAY][NO] +fused_lambfused_adam ....... ............. ............. [OKAY] [NO] +[NO] .......fused_lamb....... [OKAY].............[OKAY] + +[NO] fused_lamb....... sparse_attn.............[OKAY] + [NO] ...................sparse_attn [NO] [OKAY] ............ + .......sparse_attn[NO] ............ ....... [NO] [OKAY] [OKAY] +sparse_attn +....... ............transformer[OKAY] +transformer[NO]transformer............ ............ [NO]............[NO]....... ....... ....... [OKAY][OKAY][OKAY] + +[NO] + ....... transformerstochastic_transformerstochastic_transformer ............[OKAY]. . + [NO][NO][NO] stochastic_transformer..................... [OKAY][OKAY] [OKAY] + + +.stochastic_transformer [NO] ........ [OKAY] +[NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +async_io ............... [NO] ....... [NO] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +DeepSpeed general environment info: + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +utils .................. [YES] ...... [OKAY] +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +quantizer .............. [NO] ....... [OKAY] +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +-------------------------------------------------- +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +utils .................. [YES] ...... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: + [WARNING]  async_io: please install the libaio-devel package with yum +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +DeepSpeed general environment info: +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +utils .................. [YES] ...... [OKAY] +async_io  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................ +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [NO] ....... [NO] +async_io ............... transformer_inference[NO] ......... [NO][NO] +....... [OKAY] +utils ..................transformer_inference [YES].. ......[NO] [OKAY]....... + [OKAY] +quantizer .............. [NO]utils ......................... [OKAY][YES] + ...... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................ + [NO] ....... [NO] +async_io ............... transformer_inference[NO] ......... [NO][NO] +....... [OKAY] +utils .................. [YES] ...... [OKAY]transformer_inference + .. quantizer[NO] ..................... [NO][OKAY] +....... [OKAY] +utils-------------------------------------------------- +.................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +DeepSpeed general environment info: +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io: please install the libaio-devel package with yum +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch version .................... 1.8.1 +async_io ............... [NO] ....... [NO] +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io: please install the libaio-devel package with yum +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY]async_io + ............... [NO] ....... [NO]utils + .................. [YES] ...... [OKAY] +quantizer .............. [NO]transformer_inference ......... [OKAY][NO] + ....... [OKAY] +-------------------------------------------------- +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io: please install the libaio-devel package with yum +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io ............... [NO] ....... [NO] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +-------------------------------------------------- +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. async_io[NO] ....... [OKAY] +async_io ............... [NO] ....... [NO] + ............... [NO] utils....... ..................[NO] +[YES] ...... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +transformer_inference .. --------------------------------------------------[NO] +quantizer .............. [NO] ....... [OKAY] + ....... [OKAY] +-------------------------------------------------- +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference ..  [WARNING]  async_io: please install the libaio-devel package with yum[NO] ....... [OKAY] + +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +async_io ............... [NO] ....... [NO] +nvcc version ..................... 11.2 +DeepSpeed general environment info: +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +transformer_inference .. [NO] ....... [OKAY] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +utils .................. [YES] ...... [OKAY] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +quantizer .............. [NO] ....... [OKAY] +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +async_io ............... [NO] ....... [NO] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +.. [NO] ....... [OKAY] +utils .................. [YES] ...... async_io[OKAY] +............... [NO]quantizer ..................... [NO][NO] + ....... [OKAY] +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference ..utils [NO].................. .......[YES] [OKAY]...... + [OKAY] +quantizerutils ................................ [NO][YES] ............. [OKAY][OKAY] + +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io + ............... [NO] ....... [NO] +async_iotransformer_inference ................. [NO][NO] .............. [OKAY][NO] + +utils .................. [YES] ...... [OKAY] +transformer_inference .. [NO] quantizer....... ..............[OKAY] +[NO] ....... [OKAY] +utils ..................-------------------------------------------------- +[YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +DeepSpeed general environment info: +-------------------------------------------------- +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  async_io: please install the libaio-devel package with yum +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  async_io: please install the libaio-devel package with yum +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +async_io ............... [NO] ....... [NO] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... async_io[NO] + ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... utils[OKAY] +.................. [YES] ...... [OKAY] +utils .................. quantizer[YES] .................... [NO][OKAY] + ....... [OKAY] +quantizer ..............-------------------------------------------------- +[NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch versionDeepSpeed general environment info: .................... +1.8.1 +torch cuda version torch install path............... ...............11.1 +nvcc version ..................... 11.2 +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']deepspeed install path + ...........torch version .................... 1.8.1['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infotorch cuda version .................................. 0.5.5+cd7967d, cd7967d, master11.1 + +deepspeed wheel compiled w.nvcc version ........................... torch 1.8, cuda 11.111.2 + +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version ....................torch version 1.8.1.................... + 1.8.1 +torch cuda version torch cuda version............... ...............11.1 +11.1nvcc version + nvcc version..................... .....................11.2 +11.2deepspeed install path + deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed info deepspeed info................... ...................0.5.5+cd7967d, cd7967d, master +0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 +torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path + ............... torch install path ...............['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version ....................['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +1.8.1 +torch cuda versiontorch version ................................... 11.1 +1.8.1 +nvcc version .....................torch cuda version 11.2............... + deepspeed install path11.1 +...........nvcc version ..................... 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed install pathdeepspeed info .............................. 0.5.5+cd7967d, cd7967d, master +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed wheel compiled w. + ......deepspeed info torch 1.8, cuda 11.1 +................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... DeepSpeed general environment info: +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch install pathtorch version .................... ...............1.8.1 +torch cuda version ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']11.1 + +nvcc versiontorch version ......................................... 11.2 +1.8.1 +deepspeed install path ...........torch cuda version ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']11.1 + +deepspeed infonvcc version ........................................ 0.5.5+cd7967d, cd7967d, master11.2 + +deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+cd7967d, cd7967d, master0.5.5+cd7967d, cd7967d, master + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 +using torch.float16 for parameters ... +------------------------ arguments ------------------------ + accumulate_allreduce_grads_in_fp32 .............. False + adam_beta1 ...................................... 0.9 + adam_beta2 ...................................... 0.95 + adam_eps ........................................ 1e-08 + adlr_autoresume ................................. False + adlr_autoresume_interval ........................ 1000 + apply_query_key_layer_scaling ................... True + apply_residual_connection_post_layernorm ........ False + attention_dropout ............................... 0.1 + attention_softmax_in_fp32 ....................... False + bert_binary_head ................................ True + bert_load ....................................... None + bf16 ............................................ False + bias_dropout_fusion ............................. True + bias_gelu_fusion ................................ True + biencoder_projection_dim ........................ 0 + biencoder_shared_query_context_model ............ False + block_data_path ................................. None + checkpoint_activations .......................... True + checkpoint_in_cpu ............................... False + checkpoint_num_layers ........................... 1 + clip_grad ....................................... 1.0 + codecarbon_dir .................................. None + consumed_train_samples .......................... 0 + consumed_train_tokens ........................... 0 + consumed_valid_samples .......................... 0 + contigious_checkpointing ........................ False + cpu_optimizer ................................... False + cpu_torch_adam .................................. False + curriculum_learning ............................. False + data_impl ....................................... mmap + data_parallel_size .............................. 1 + data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] + dataloader_type ................................. single + DDP_impl ........................................ local + decoder_seq_length .............................. None + deepscale ....................................... False + deepscale_config ................................ None + deepspeed ....................................... True + deepspeed_activation_checkpointing .............. True + deepspeed_config ................................ ./ds_config.1587017.json + deepspeed_mpi ................................... False + distribute_checkpointed_activations ............. False + distributed_backend ............................. nccl + embedding_path .................................. None + encoder_seq_length .............................. 2048 + eod_mask_loss ................................... False + eval_interval ................................... 1000 + eval_iters ...................................... 5 + evidence_data_path .............................. None + exit_duration_in_mins ........................... 55 + exit_interval ................................... None + ffn_hidden_size ................................. 46400 + finetune ........................................ False + fp16 ............................................ True + fp16_lm_cross_entropy ........................... False + fp32_residual_connection ........................ False + gigaflos_no_embeds .............................. 0 + global_batch_size ............................... 2048 + glu_activation .................................. None + hidden_dropout .................................. 0.1 + hidden_size ..................................... 11600 + hysteresis ...................................... 2 + ict_head_size ................................... None + ict_load ........................................ None + img_dim ......................................... 224 + indexer_batch_size .............................. 128 + indexer_log_interval ............................ 1000 + init_method_std ................................. 0.02 + init_method_xavier_uniform ...................... False + initial_loss_scale .............................. 4294967296 + kv_channels ..................................... 145 + layernorm_epsilon ............................... 1e-05 + lazy_mpu_init ................................... None + load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + local_rank ...................................... 0 + log_batch_size_to_tensorboard ................... True + log_interval .................................... 1 + log_learning_rate_to_tensorboard ................ True + log_loss_scale_to_tensorboard ................... True + log_num_zeros_in_grad ........................... False + log_params_norm ................................. False + log_timers_to_tensorboard ....................... True + log_validation_ppl_to_tensorboard ............... True + loss_on_targets_only ............................ False + loss_scale ...................................... 12.0 + loss_scale_window ............................... 1000 + lr .............................................. 6e-05 + lr_decay_iters .................................. None + lr_decay_samples ................................ None + lr_decay_style .................................. cosine + lr_decay_tokens ................................. 260000000000 + lr_warmup_fraction .............................. None + lr_warmup_iters ................................. 0 + lr_warmup_samples ............................... 216320 + make_vocab_size_divisible_by .................... 128 + mask_prob ....................................... 0.15 + masked_softmax_fusion ........................... False + max_position_embeddings ......................... 2048 + memory_centric_tiled_linear ..................... False + merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt + micro_batch_size ................................ 1 + min_loss_scale .................................. 1.0 + min_lr .......................................... 6e-06 + mmap_warmup ..................................... False + no_load_optim ................................... None + no_load_rng ..................................... None + no_save_optim ................................... None + no_save_rng ..................................... None + num_attention_heads ............................. 80 + num_channels .................................... 3 + num_classes ..................................... 1000 + num_layers ...................................... 64 + num_layers_per_virtual_pipeline_stage ........... None + num_workers ..................................... 2 + onnx_safe ....................................... None + openai_gelu ..................................... False + optimizer ....................................... adam + override_lr_scheduler ........................... False + params_dtype .................................... torch.float16 + partition_activations ........................... False + patch_dim ....................................... 16 + pipeline_model_parallel_size .................... 32 + position_embedding_type ......................... PositionEmbeddingType.absolute + profile_backward ................................ False + query_in_block_prob ............................. 0.1 + rampup_batch_size ............................... None + rank ............................................ 0 + remote_device ................................... none + reset_attention_mask ............................ False + reset_position_ids .............................. False + retriever_report_topk_accuracies ................ [] + retriever_score_scaling ......................... False + retriever_seq_length ............................ 256 + sample_rate ..................................... 1.0 + save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + save_interval ................................... 300 + scatter_gather_tensors_in_pipeline .............. True + scattered_embeddings ............................ False + seed ............................................ 43 + seq_length ...................................... 2048 + sgd_momentum .................................... 0.9 + short_seq_prob .................................. 0.1 + split ........................................... 949,50,1 + split_transformers .............................. False + synchronize_each_layer .......................... False + tensor_model_parallel_size ...................... 4 + tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard + tensorboard_log_interval ........................ 1 + tensorboard_queue_size .......................... 5 + tile_factor ..................................... 1 + titles_data_path ................................ None + tokenizer_name_or_path .......................... None + tokenizer_type .................................. GPT2BPETokenizer + train_iters ..................................... None + train_samples ................................... 600000000 + train_tokens .................................... 300000000000 + use_checkpoint_lr_scheduler ..................... False + use_contiguous_buffers_in_ddp ................... False + use_cpu_initialization .......................... None + use_one_sent_docs ............................... False + use_pin_memory .................................. False + virtual_pipeline_model_parallel_size ............ None + vocab_extra_ids ................................. 0 + vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json + weight_decay .................................... 0.1 + world_size ...................................... 128 + zero_allgather_bucket_size ...................... 0.0 + zero_contigious_gradients ....................... False + zero_reduce_bucket_size ......................... 0.0 + zero_reduce_scatter ............................. False + zero_stage ...................................... 1 +-------------------- end of arguments --------------------- +setting number of micro-batches to constant 2048 +> building GPT2BPETokenizer tokenizer ... + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) +> initializing torch distributed ... +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +> setting tensorboard ... +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninja .................. [OKAY] +-------------------------------------------------- +op name ................ installed .. compatible +-------------------------------------------------- +cpu_adam ............... [YES] ...... [OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +transformer_inference .. [NO] ....... [OKAY] +async_io utils............... ..................[NO] [YES]....... ......[NO] +[OKAY] +quantizer .............. [NO] ....... [OKAY] +transformer_inference .. [NO]-------------------------------------------------- +....... [OKAY] +utils .................. [YES] ...... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+cd7967d, cd7967d, master +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +> initializing tensor model parallel with size 4 +> initializing pipeline model parallel with size 32 +> setting random seeds to 43 ... +[2021-10-18 04:45:50,651] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 +> compiling dataset index builder ... +make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/data' +make: Nothing to be done for 'default'. +make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/data' +>>> done with dataset index builder. Compilation time: 0.302 seconds +WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. +> compiling and loading fused kernels ... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/fused_kernels/build/build.ninja... +Building extension module fused_mix_prec_layer_norm_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +ninja: no work to do. +Loading extension module fused_mix_prec_layer_norm_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +>>> done with compiling and loading fused kernels. Compilation time: 4.130 seconds +time to initialize megatron (seconds): -28.915 +[after megatron is initialized] datetime: 2021-10-18 04:45:55 +building GPT model ... +[2021-10-18 04:45:55,148] [INFO] [utils.py:806:see_memory_usage] Before Building Model +[2021-10-18 04:45:55,149] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +[2021-10-18 04:45:55,149] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.55 GB, percent = 21.1% +SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} +[2021-10-18 04:45:56,825] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer +stage=0 layers=5 + 0: _to_float16 + 1: EmbeddingPipe + 2: + 3: ParallelTransformerLayerPipe + 4: ParallelTransformerLayerPipe +stage=1 layers=2 + 5: ParallelTransformerLayerPipe + 6: ParallelTransformerLayerPipe +stage=2 layers=2 + 7: ParallelTransformerLayerPipe + 8: ParallelTransformerLayerPipe +stage=3 layers=2 + 9: ParallelTransformerLayerPipe + 10: ParallelTransformerLayerPipe +stage=4 layers=2 + 11: ParallelTransformerLayerPipe + 12: ParallelTransformerLayerPipe +stage=5 layers=2 + 13: ParallelTransformerLayerPipe + 14: ParallelTransformerLayerPipe +stage=6 layers=2 + 15: ParallelTransformerLayerPipe + 16: ParallelTransformerLayerPipe +stage=7 layers=2 + 17: ParallelTransformerLayerPipe + 18: ParallelTransformerLayerPipe +stage=8 layers=2 + 19: ParallelTransformerLayerPipe + 20: ParallelTransformerLayerPipe +stage=9 layers=2 + 21: ParallelTransformerLayerPipe + 22: ParallelTransformerLayerPipe +stage=10 layers=2 + 23: ParallelTransformerLayerPipe + 24: ParallelTransformerLayerPipe +stage=11 layers=2 + 25: ParallelTransformerLayerPipe + 26: ParallelTransformerLayerPipe +stage=12 layers=2 + 27: ParallelTransformerLayerPipe + 28: ParallelTransformerLayerPipe +stage=13 layers=2 + 29: ParallelTransformerLayerPipe + 30: ParallelTransformerLayerPipe +stage=14 layers=2 + 31: ParallelTransformerLayerPipe + 32: ParallelTransformerLayerPipe +stage=15 layers=2 + 33: ParallelTransformerLayerPipe + 34: ParallelTransformerLayerPipe +stage=16 layers=2 + 35: ParallelTransformerLayerPipe + 36: ParallelTransformerLayerPipe +stage=17 layers=2 + 37: ParallelTransformerLayerPipe + 38: ParallelTransformerLayerPipe +stage=18 layers=2 + 39: ParallelTransformerLayerPipe + 40: ParallelTransformerLayerPipe +stage=19 layers=2 + 41: ParallelTransformerLayerPipe + 42: ParallelTransformerLayerPipe +stage=20 layers=2 + 43: ParallelTransformerLayerPipe + 44: ParallelTransformerLayerPipe +stage=21 layers=2 + 45: ParallelTransformerLayerPipe + 46: ParallelTransformerLayerPipe +stage=22 layers=2 + 47: ParallelTransformerLayerPipe + 48: ParallelTransformerLayerPipe +stage=23 layers=2 + 49: ParallelTransformerLayerPipe + 50: ParallelTransformerLayerPipe +stage=24 layers=2 + 51: ParallelTransformerLayerPipe + 52: ParallelTransformerLayerPipe +stage=25 layers=2 + 53: ParallelTransformerLayerPipe + 54: ParallelTransformerLayerPipe +stage=26 layers=2 + 55: ParallelTransformerLayerPipe + 56: ParallelTransformerLayerPipe +stage=27 layers=2 + 57: ParallelTransformerLayerPipe + 58: ParallelTransformerLayerPipe +stage=28 layers=2 + 59: ParallelTransformerLayerPipe + 60: ParallelTransformerLayerPipe +stage=29 layers=2 + 61: ParallelTransformerLayerPipe + 62: ParallelTransformerLayerPipe +stage=30 layers=2 + 63: ParallelTransformerLayerPipe + 64: ParallelTransformerLayerPipe +stage=31 layers=6 + 65: ParallelTransformerLayerPipe + 66: ParallelTransformerLayerPipe + 67: + 68: MixedFusedLayerNorm + 69: EmbeddingPipe + 70: float16_to_fp32 + loss: CrossEntropy + > number of parameters on (tensor, pipeline) model parallel rank (0, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 18): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 18): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (3, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 20): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 20): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 24): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 24): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (3, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 978291800 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 978291800 + + > number of parameters on (tensor, pipeline) model parallel rank (3, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (1, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (0, 31): 978315000 +[2021-10-18 04:45:57,517] [INFO] [utils.py:806:see_memory_usage] After Building Model +[2021-10-18 04:45:57,518] [INFO] [utils.py:807:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB +[2021-10-18 04:45:57,518] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.72 GB, percent = 21.2% + > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 978291800 +setting training iterations to 292968 +> learning rate decay style: cosine +DeepSpeed is enabled. +[2021-10-18 04:45:57,519] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+cd7967d, git-hash=cd7967d, git-branch=master +[2021-10-18 04:45:57,556] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False +[2021-10-18 04:45:57,556] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer +[2021-10-18 04:45:57,556] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer +[2021-10-18 04:45:57,557] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam +[2021-10-18 04:45:57,557] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= +[2021-10-18 04:45:57,557] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer +[2021-10-18 04:45:57,557] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 +[2021-10-18 04:45:57,557] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 +[2021-10-18 04:45:57,557] [INFO] [stage2.py:113:__init__] CPU Offload: False +[2021-10-18 04:45:57,557] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False +Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 3 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 0 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 125 partition count [1, 1] and sizes[(978112000, False), (203000, False)] Rank: 124 partition count [1, 1] and sizes[(978112000, False), (203000, False)] + +Rank: 1 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 2 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 126 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 127 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +[2021-10-18 04:45:59,398] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states +[2021-10-18 04:45:59,399] [INFO] [utils.py:807:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB +[2021-10-18 04:45:59,399] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.74 GB, percent = 21.2% +[2021-10-18 04:45:59,444] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states +[2021-10-18 04:45:59,445] [INFO] [utils.py:807:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB +[2021-10-18 04:45:59,445] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.74 GB, percent = 21.2% +[2021-10-18 04:45:59,445] [INFO] [stage2.py:474:__init__] optimizer state initialized +[2021-10-18 04:45:59,473] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer +[2021-10-18 04:45:59,474] [INFO] [utils.py:807:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB +[2021-10-18 04:45:59,474] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.74 GB, percent = 21.2% +[2021-10-18 04:45:59,474] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +[2021-10-18 04:45:59,474] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler +[2021-10-18 04:45:59,474] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +[2021-10-18 04:45:59,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] +[2021-10-18 04:45:59,475] [INFO] [config.py:940:print] DeepSpeedEngine configuration: +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] amp_enabled .................. False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] amp_params ................... False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] curriculum_enabled ........... True +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] dataloader_drop_last ......... False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] disable_allgather ............ False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] dump_state ................... False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_enabled ........... False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] eigenvalue_verbose ........... False +[2021-10-18 04:45:59,475] [INFO] [config.py:944:print] elasticity_enabled ........... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] flops_profiler_config ........ { + "enabled": false, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] fp16_enabled ................. True +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] global_rank .................. 0 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] gradient_accumulation_steps .. 2048 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] loss_scale ................... 0 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] memory_breakdown ............. False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] optimizer_name ............... None +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] optimizer_params ............. None +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] pld_enabled .................. False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] pld_params ................... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] prescale_gradients ........... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_groups .............. 1 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_offset .............. 1000 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_period .............. 1000 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_rounding ............ 0 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_start_bits .......... 16 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_target_bits ......... 8 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_training_enabled .... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_type ................ 0 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] quantize_verbose ............. False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] scheduler_name ............... None +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] scheduler_params ............. None +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] sparse_attention ............. None +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] steps_per_print .............. 2000 +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] tensorboard_enabled .......... False +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] tensorboard_output_path ...... +[2021-10-18 04:45:59,476] [INFO] [config.py:944:print] train_batch_size ............. 2048 +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] use_quantizer_kernel ......... False +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] wall_clock_breakdown ......... False +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] world_size ................... 1 +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] zero_allow_untested_optimizer False +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] zero_config .................. { + "stage": 1, + "contiguous_gradients": true, + "reduce_scatter": true, + "reduce_bucket_size": 5.000000e+08, + "allgather_partitions": true, + "allgather_bucket_size": 5.000000e+08, + "overlap_comm": false, + "load_from_fp32_weights": true, + "elastic_checkpoint": true, + "offload_param": null, + "offload_optimizer": null, + "sub_group_size": 1.000000e+09, + "prefetch_bucket_size": 5.000000e+07, + "param_persistence_threshold": 1.000000e+05, + "max_live_parameters": 1.000000e+09, + "max_reuse_distance": 1.000000e+09, + "gather_fp16_weights_on_model_save": false, + "ignore_unused_parameters": true, + "round_robin_gradients": false, + "legacy_stage1": false +} +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] zero_enabled ................. True +[2021-10-18 04:45:59,477] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 +[2021-10-18 04:45:59,477] [INFO] [config.py:946:print] json = { + "train_micro_batch_size_per_gpu": 1, + "train_batch_size": 2.048000e+03, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": 1 + }, + "fp16": { + "enabled": true, + "loss_scale": 0, + "loss_scale_window": 500, + "hysteresis": 2, + "min_loss_scale": 1, + "initial_scale_power": 12 + }, + "curriculum_learning": { + "enabled": true, + "curriculum_type": "seqlen", + "min_difficulty": 64, + "max_difficulty": 2.048000e+03, + "schedule_type": "fixed_linear", + "schedule_config": { + "total_curriculum_step": 3.600000e+04, + "difficulty_step": 8 + } + }, + "steps_per_print": 2.000000e+03, + "wall_clock_breakdown": false +} +[2021-10-18 04:45:59,477] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 +[2021-10-18 04:45:59,864] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,865] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. + will not load any checkpoints and will start from random +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-18 04:45:59,960] [WARNING] [engine.py:2020:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +time (ms) | load-checkpoint: 0.56 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.22432 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.22432estimated model parameters: 125.22432 + +estimated model parameters: 125.22432 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.368064 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.368064estimated model parameters without embeddings: 103.368064 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-18 04:45:59 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +> building train, validation, and test datasets ... +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 + > datasets target sizes (minimum size): +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 + train: 600000000 + validation: 3000320 +estimated model parameters without embeddings: 103.3650944 + test: 10240 +> building train, validation, and test datasets for GPT ... + > building dataset index ... + reading sizes... + reading pointers... + reading document index... + creating numpy buffer of mmap... + creating memory view of numpy buffer... + > finished creating indexed dataset in 0.038674 seconds + number of documents: 304230423 + > dataset split: + train: + document indices in [0, 288714672) total of 288714672 documents + validation: + document indices in [288714672, 303926193) total of 15211521 documents + test: + document indices in [303926193, 304230423) total of 304230 documents + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.093 seconds + total number of samples: 657686117 + total number of epochs: 5 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.057 seconds + total number of samples: 6927161 + total number of epochs: 1 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.003 seconds + total number of samples: 137384 + total number of epochs: 1 +> finished creating GPT datasets ... +[after dataloaders are built] datetime: 2021-10-18 04:46:04 +done with setup ... +training ... +Number of parameters: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion + +time (ms) | model-and-optimizer-setup: 4874.10 | train/valid/test-data-iterators-setup: 4181.58 +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +[before the start of training step] datetime: 2021-10-18 04:46:04 +[2021-10-18 04:46:04,758] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information +[2021-10-18 04:46:04,759] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False +[2021-10-18 04:46:04,759] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers +[2021-10-18 04:46:04,759] [INFO] [checkpointing.py:554:forward] ----Synchronization False +[2021-10-18 04:46:04,759] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False +srun: Job step aborted: Waiting up to 62 seconds for job step to finish. +Killing subprocess 756918 +Killing subprocess 756919 +Killing subprocess 2064332 +Killing subprocess 756920 +Killing subprocess 756921 +Killing subprocess 2064333 +Killing subprocess 2636667 +Killing subprocess 2865212 +Killing subprocess 2064334 +Killing subprocess 3180688 +Killing subprocess 2636668 +Killing subprocess 2860545 +Killing subprocess 2064335 +Killing subprocess 2865213 +Killing subprocess 3180689 +Killing subprocess 2636669 +Killing subprocess 3069671 +Killing subprocess 2636671 +Killing subprocess 2860546 +Killing subprocess 1543842 +Main process received SIGTERM, exiting +Killing subprocess 3069672 +Main process received SIGTERM, exiting +Killing subprocess 2860547 +Killing subprocess 2865214 +Killing subprocess 1543370 +Killing subprocess 2865216 +Killing subprocess 3180690 +Main process received SIGTERM, exiting +Killing subprocess 3180691 +Killing subprocess 3069673 +Killing subprocess 1556692 +Killing subprocess 2860548 +Killing subprocess 3069674 +Killing subprocess 1543843 +Killing subprocess 1543844 +Killing subprocess 1543371 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 1543846 +Killing subprocess 1550946 +Killing subprocess 1540564 +Killing subprocess 1543372 +Main process received SIGTERM, exiting +Killing subprocess 1556693 +Killing subprocess 1550947 +Main process received SIGTERM, exiting +Killing subprocess 1540565 +Killing subprocess 4003058 +Killing subprocess 1546437 +Killing subprocess 1556694 +Killing subprocess 1556695 +Killing subprocess 1550948 +Killing subprocess 1540566 +Killing subprocess 1550949 +Main process received SIGTERM, exiting +Killing subprocess 1546438 +Killing subprocess 1543374 +Main process received SIGTERM, exiting +Killing subprocess 4003059 +Killing subprocess 1546439 +Killing subprocess 1543778 +Killing subprocess 4003060 +Killing subprocess 20328 +Killing subprocess 1540567 +Killing subprocess 1543779 +Main process received SIGTERM, exiting +Killing subprocess 4003062 +Killing subprocess 1546440 +Killing subprocess 20329 +Main process received SIGTERM, exiting +Killing subprocess 1544439 +Main process received SIGTERM, exiting +Killing subprocess 20330 +Killing subprocess 1543780 +Killing subprocess 1650226 +Killing subprocess 20331 +Killing subprocess 1544440 +Killing subprocess 1817676 +Killing subprocess 1650227 +Killing subprocess 1544441 +Killing subprocess 394276 +Main process received SIGTERM, exiting +Killing subprocess 1650228 +Killing subprocess 377681 +Main process received SIGTERM, exiting +Killing subprocess 1817677 +Killing subprocess 1544442 +Main process received SIGTERM, exiting +Killing subprocess 1288766 +Killing subprocess 394277 +Killing subprocess 1650229 +Killing subprocess 1817678 +Killing subprocess 568044 +Killing subprocess 394278 +Killing subprocess 356775 +Killing subprocess 1543781 +Killing subprocess 2203462 +Killing subprocess 1935002 +Killing subprocess 1817680 +Killing subprocess 4109197 +Killing subprocess 627666 +Killing subprocess 377682 +Main process received SIGTERM, exiting +Killing subprocess 396853 +Killing subprocess 482551 +Killing subprocess 1288767 +Killing subprocess 3396833 +Main process received SIGTERM, exiting +Killing subprocess 568045 +Killing subprocess 356776 +Killing subprocess 4109198 +Killing subprocess 627667 +Killing subprocess 394279 +Killing subprocess 1935003 +Killing subprocess 356777 +slurmstepd: error: *** STEP 1587017.0 ON r6i4n4 CANCELLED AT 2021-10-18T04:51:36 *** +Killing subprocess 377683 +Killing subprocess 2203463 +Killing subprocess 482552 +Killing subprocess 396854 +Main process received SIGTERM, exiting +Killing subprocess 3396834 +Killing subprocess 4109199 +Killing subprocess 627668 +Killing subprocess 377684 +Killing subprocess 482553 +Killing subprocess 3396835 +Killing subprocess 1935004 +Killing subprocess 1288768 +Killing subprocess 396855 +Killing subprocess 1288770 +Killing subprocess 627670 +Killing subprocess 4109201 +Killing subprocess 2203464 +Main process received SIGTERM, exiting +Killing subprocess 2203465 +Killing subprocess 3396836 +Killing subprocess 482554 +Main process received SIGTERM, exiting +Killing subprocess 1935005 +Main process received SIGTERM, exiting +Killing subprocess 568046 +Killing subprocess 568047 +Main process received SIGTERM, exiting +Killing subprocess 396856 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 356779 +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Main process received SIGTERM, exiting +Killing subprocess 717811 +Killing subprocess 717812 +Killing subprocess 717813 +Killing subprocess 717814 +Main process received SIGTERM, exiting +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninja +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] +[OKAY] + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +op name +op name op name................op name ................installed................ ................ installed ..installed installed ..compatible + ....--------------------------------------------------compatible +compatiblecompatible + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + +cpu_adam ............... [NO] ....... [OKAY]cpu_adamcpu_adam + .............................. cpu_adam[NO][NO] ............................. [NO] fused_adam [OKAY][OKAY] +....... +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +op nameop name + op name ................ op name................................ installed................installed installed .. installedcompatible.... + compatible--------------------------------------------------compatible.. + + + --------------------------------------------------compatible-------------------------------------------------- + + +-------------------------------------------------- +cpu_adam ...............cpu_adam [NO]cpu_adam ...............cpu_adam....... ............... [NO]...............[NO][OKAY] +.......[NO]....... [OKAY] ....... +[OKAY] fused_adam +[OKAY] +............. [NO] ....... [OKAY] +fused_adamfused_adamfused_lamb .......................................fused_adam [NO] [NO] [NO]............. ....... .............. [OKAY][NO][OKAY] + +[OKAY]....... + fused_lamb[OKAY] +.............fused_lamb [NO]............. fused_lamb....... [NO].............sparse_attn[OKAY] + ...................[NO] [OKAY][NO]....... + .......[OKAY] +[OKAY] +sparse_attn ............transformer [NO]............ sparse_attn[NO]....... ...................[OKAY] +[NO][OKAY]sparse_attn +transformer....... ............[OKAY] stochastic_transformer............ + [NO][NO].transformer .......[NO]............ .......[NO].......[OKAY] +.......[OKAY][OKAY] + +[OKAY]stochastic_transformer + .transformer stochastic_transformer [NO] ............ .......[NO]. [OKAY][NO] + ....... [OKAY] + ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] + .............[OKAY] +[NO] ....... [OKAY] +fused_adamfused_lambfused_adamfused_adam .................................................... [NO][NO][NO] .......[NO] .............. .......[OKAY][OKAY] [OKAY][OKAY] + + + +fused_lambfused_lamb ............. fused_lamb ............. [NO] ............. [NO].......sparse_attn [OKAY][NO] + .......................... [NO][OKAY] [OKAY] + +....... [OKAY] +sparse_attn ............ [NO]transformer ................... [NO][OKAY] +sparse_attn.......sparse_attn transformer .................................... [OKAY] [NO] +[NO] [NO] ....... ....... stochastic_transformer .......[OKAY][OKAY]. + + [OKAY][NO] + transformer.......stochastic_transformer ............[OKAY]transformer. + ............[NO] [NO] [NO] ..................... [OKAY] [OKAY] +[OKAY] + +stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja ...................................................... [OKAY][OKAY].................. + + [OKAY]----------------------------------------------------------------------------------------------------[OKAY] + + + +--------------------------------------------------op nameop name --------------------------------------------------................ + + op nameinstalledop name................ .................. ................ installed compatibleinstalled installed + ..--------------------------------------------------.... + compatiblecompatiblecompatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +cpu_adam + + ............... [NO] ....... [OKAY] +cpu_adam cpu_adamcpu_adam............... ..............................[NO] [NO][NO]....... fused_adam ....... ....... [OKAY] [OKAY] +[OKAY]............. + + [NO] ....... [OKAY] +fused_lamb fused_adam............. fused_adam[NO]fused_adam............. .......................... .......[NO] [NO] [NO] ....... [OKAY]..............[OKAY] + +[OKAY][OKAY] + +fused_lambfused_lambfused_lamb ....................................... [NO]sparse_attn[NO][NO] ................................. [NO] [OKAY][OKAY][OKAY] + + +....... [OKAY] +transformer ............ [NO] ....... [OKAY] +sparse_attnsparse_attn sparse_attn ............ ............ stochastic_transformer............ [NO] [NO][NO] ...................... [OKAY] +[NO][OKAY][OKAY] + +transformer....... transformer............transformer[OKAY] +............[NO] ............ [NO]....... [NO].......[OKAY] .......[OKAY] + +[OKAY] +stochastic_transformer stochastic_transformerstochastic_transformer. .[NO]. [NO].......[NO] .......[OKAY]....... + [OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninja ninja .................................... ....................................[OKAY][OKAY] +[OKAY] +[OKAY]-------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op name--------------------------------------------------op name + + ................................ op nameinstalledop name installed.................................. compatible ..installed +installed -------------------------------------------------- compatible.... + + --------------------------------------------------compatiblecompatible + + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... cpu_adam[NO] ......................cpu_adam cpu_adam [OKAY]............... + [NO]............... [NO].......[NO] .......[OKAY]....... + [OKAY][OKAY] +fused_adam + ............. [NO] ....... [OKAY] +fused_adam ............. fused_adam[NO] fused_adam fused_lamb.................... ..........................[OKAY] [NO] +[NO][NO] fused_lamb....... .............. ............. [OKAY][OKAY][OKAY][NO] + + +....... fused_lamb[OKAY] +fused_lamb .......................... [NO][NO] .............. [OKAY]sparse_attn[OKAY] + +............sparse_attn [NO]............ .......[NO] [OKAY]....... + [OKAY] +transformer transformer............ ............sparse_attn[NO] sparse_attn [NO] ...................................... [OKAY][NO][OKAY][NO] + + .............. [OKAY][OKAY] +stochastic_transformerstochastic_transformer + transformer ..transformer............ [NO]............[NO][NO] ..............[NO]....... [OKAY].......[OKAY][OKAY] + + +[OKAY] +stochastic_transformerstochastic_transformer .. [NO][NO] .............. [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY] +[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +-------------------------------------------------- +op name +op name op nameop name................ ................................installed................ installedinstalled..installed .... compatible compatiblecompatible +.. +-------------------------------------------------- +-------------------------------------------------- + +--------------------------------------------------compatible + +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam..............................cpu_adam [NO] ............... ...............[NO] ..............[NO] [NO] [OKAY][OKAY] ....... + +....... [OKAY][OKAY] + +fused_adamfused_adam .......................... fused_adamfused_adam[NO] [NO] .......................... ....... ....... [NO][OKAY][NO][OKAY] + + .............. fused_lambfused_lamb[OKAY] [OKAY] +.......................... + [NO][NO]fused_lamb ..............fused_lamb [OKAY] ............. +[OKAY]............. + [NO][NO] .............. [OKAY][OKAY] + +sparse_attn sparse_attn............ ............[NO] [NO]....... sparse_attnsparse_attn....... [OKAY] ........................ + [OKAY] [NO] +[NO]transformer .......transformer................... ............[OKAY][NO] [OKAY] + [NO] +....... transformer....... transformer[OKAY] ............ + [OKAY]............[NO] + [NO].......stochastic_transformer [OKAY]stochastic_transformer....... + . .[NO][OKAY] stochastic_transformer[NO] + ............... [OKAY]stochastic_transformer + [OKAY] + [NO] ........ [OKAY][NO] + ....... [OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +DeepSpeed C++/CUDA extension op report------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] + [OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- + +op name--------------------------------------------------op name +op name................................ op name ................ installedinstalled................ installed.. ..installed ..compatible compatible +..--------------------------------------------------compatible + +-------------------------------------------------- + +compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adam cpu_adam............... cpu_adam ...............cpu_adam [NO] [NO]............... ...................... ....... [NO] [NO] [OKAY][OKAY] ....... + +....... [OKAY][OKAY] + +fused_adamfused_adam .............fused_adamfused_adam............. .............[NO].............[NO] [NO][NO].............. .......[OKAY].......[OKAY] + +[OKAY][OKAY] + +fused_lambfused_lamb fused_lamb..........................fused_lamb [NO][NO]............. ............. ....... ....... [NO][NO] [OKAY] [OKAY] + ....... +....... [OKAY][OKAY] + +sparse_attnsparse_attn ........................sparse_attnsparse_attn [NO]............[NO]............ [NO].............. [NO] [OKAY].......[OKAY] + +.......[OKAY] +[OKAY]transformertransformer +transformer ........................ transformer ............[NO] [NO] ............[NO] ....... .......[NO]....... [OKAY]....... [OKAY] + [OKAY] +[OKAY] + +stochastic_transformerstochastic_transformerstochastic_transformer stochastic_transformer ... . [NO][NO][NO] [NO]..................... [OKAY] .......[OKAY] + +[OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY] [OKAY] + +[OKAY] + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + + +op nameop nameop name op name ................ ................................ ................ installed installedinstalledinstalled .. .. .... compatible +compatiblecompatiblecompatible +-------------------------------------------------- + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +cpu_adam ...............cpu_adam cpu_adam[NO] cpu_adam.............................. .......[NO]...............[NO] [OKAY].......[NO]....... + .......[OKAY][OKAY] + +[OKAY] +fused_adam ............. [NO] .......fused_adamfused_adam [OKAY].............fused_adam............. + [NO].............[NO] fused_lamb ....... [NO] .................... [OKAY] ....... +[NO][OKAY] +[OKAY]....... +fused_lamb [OKAY]fused_lamb............. +fused_lamb ............. [NO] ............. [NO] ....... [NO] ..............[OKAY] +[OKAY][OKAY]sparse_attn + + ............ [NO] ....... [OKAY] +transformer ............ [NO] .......sparse_attn sparse_attn [OKAY] sparse_attn........................ +............[NO][NO] [NO]....... .......stochastic_transformer ....... [OKAY] [OKAY] +.[OKAY] + +[NO]transformertransformer transformer ................... ........................[NO][OKAY] + [NO][NO]....... [OKAY].............. + [OKAY][OKAY] + +stochastic_transformer stochastic_transformer.stochastic_transformer [NO]. . ....... [NO][NO][OKAY] + .............. [OKAY][OKAY] + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- + + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja ...................................................... [OKAY]..................[OKAY][OKAY] + + +[OKAY]---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +op name + + --------------------------------------------------................op nameop name + installed op name................ ................ installed.. installed................compatible.. +..installedcompatible --------------------------------------------------compatible + +.. +---------------------------------------------------------------------------------------------------- + +compatible +-------------------------------------------------- +cpu_adam ............... cpu_adam[NO]cpu_adam .......cpu_adam.............................. [OKAY]...............[NO] + [NO]....... .......[NO] [OKAY][OKAY]....... + + fused_adam[OKAY] +............. [NO] ....... [OKAY]fused_adam + ............. [NO] fused_lamb....... fused_adam[OKAY].............fused_adam + .............[NO] .................... fused_lamb [NO][OKAY][NO] +............. ....... ....... [NO] [OKAY] [OKAY] +....... + [OKAY] +fused_lambfused_lamb ..........................sparse_attn [NO][NO]............ .......[NO]....... .......sparse_attn[OKAY] [OKAY] +............[OKAY] + + [NO] .......transformer [OKAY]............ + [NO]transformer ................... sparse_attn[OKAY][NO] sparse_attn + ............ ....... ............[NO][OKAY]stochastic_transformer +[NO] ..............stochastic_transformer . [OKAY] [OKAY][NO]. + +....... transformer [NO]transformer[OKAY] +............ ....... ............ [NO] [OKAY] +[NO]....... .......[OKAY] +[OKAY] +stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] +[OKAY] +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + + +----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- + + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + + +------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- + +-------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja +JIT compiled ops requires ninja +-------------------------------------------------- + +JIT compiled ops requires ninja +------------------------------------------------------------------------------------------------------------------------------------------------------ + +--------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja +-------------------------------------------------- + +-------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ...................................................... .................. [OKAY] +[OKAY][OKAY][OKAY]-------------------------------------------------- + + + +---------------------------------------------------------------------------------------------------- +--------------------------------------------------op name + + op name................op name op name ................................installed ................ ..installedinstalledinstalled compatible .. +.. --------------------------------------------------compatible +.. +compatible -------------------------------------------------- +compatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... [NO]cpu_adam ....... cpu_adam...............[OKAY]cpu_adam +............... [NO]...............[NO] .......[NO]....... [OKAY].......fused_adam[OKAY] + +[OKAY]............. + [NO] ....... [OKAY] +fused_adamfused_lamb fused_adam..........................fused_adam .............[NO][NO]............. .......[NO][NO] ....... ....... [OKAY].......[OKAY][OKAY] + + +[OKAY] +fused_lambfused_lambfused_lamb ....................................... [NO][NO][NO]sparse_attn ................................. [OKAY][NO] [OKAY].......[OKAY] + + +[OKAY] +transformer ............ [NO] ....... sparse_attnsparse_attn[OKAY] sparse_attn............ + ........................[NO] [NO]stochastic_transformer [NO] ....... ....... ....... . [OKAY][OKAY][OKAY][NO] + + +....... transformer[OKAY]transformertransformer + .................................... [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] + + +stochastic_transformerstochastic_transformer stochastic_transformer .. . [NO] [NO] [NO] .............. [OKAY].......[OKAY] + +[OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +--------------------------------------------------op nameop nameop name + ................................................ op name installedinstalled installed .................... compatible..compatibleinstalled + +-------------------------------------------------- compatible-------------------------------------------------- + + +..-------------------------------------------------- +compatible +ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] +--------------------------------------------------cpu_adam +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + + +cpu_adam cpu_adam.............................. [NO]............... [NO] ..............cpu_adam [NO] [OKAY] +[OKAY]...................... + [OKAY][NO] +op nameop nameop name op name ................................ ................ ................installedinstalledinstalled ..installed.. .. ..compatiblecompatiblecompatible + + ....... [OKAY] +fused_adam .............fused_adam [NO]............. fused_adam....... [NO].............[OKAY] + +compatible---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +....... fused_adam[NO][OKAY] fused_lamb +....... .......................... [OKAY][NO]fused_lamb[NO] +-------------------------------------------------- + ........................... fused_lamb[NO][OKAY] +[OKAY].................... +cpu_adamcpu_adamcpu_adam cpu_adam.............................. ............... ............... [NO] [NO][NO] [NO] .............. ..............[OKAY] [OKAY] +[OKAY] +[OKAY] + + [NO][OKAY] + ....... fused_lamb[OKAY] +fused_adam .............fused_adam fused_adamfused_adam [NO] ............. ............. .................... [NO] [NO][OKAY][NO]....... +............. sparse_attn[NO] ............ .......[NO] sparse_attn ....... sparse_attn ............[OKAY] [OKAY] + + ....... ....... [OKAY]fused_lamb +[OKAY] [OKAY] +............. + [NO] .......fused_lamb fused_lamb fused_lamb[OKAY] ............. +............[NO] transformer [NO] ....... ............ ....... [OKAY][NO] + [OKAY].......transformer +.......................... [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] +[OKAY] +sparse_attn + ............ [NO] ....... [OKAY] + [OKAY]sparse_attn +transformer ............ sparse_attn[NO]sparse_attn sparse_attn ............................... [OKAY][NO] +............ [NO] ....... [NO] ....... stochastic_transformer[OKAY] +............transformer ............[NO]stochastic_transformer............ [NO].[NO]....... .......[NO].......[OKAY] +.......[OKAY] [OKAY] +[OKAY] + +[OKAY]....... + transformer. [OKAY] ............ +stochastic_transformer stochastic_transformer .transformer. [NO][NO]............ ....... .......[OKAY][NO] + [OKAY] +....... [OKAY] +transformer[NO] transformer[NO]............ ..........................[NO] [OKAY][OKAY].......[NO] + + [OKAY]....... + [OKAY]stochastic_transformer +stochastic_transformer . [NO] ....... [OKAY] + stochastic_transformer stochastic_transformer. .[NO] .[NO] .......[NO]....... [OKAY][OKAY]....... + + [OKAY] +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] +[OKAY] + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +op nameop nameop name op name ................ ................................ ................installed installedinstalledinstalled.. ...... compatible compatiblecompatiblecompatible + + + +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +cpu_adamcpu_adamcpu_adamcpu_adam ............... ............... ...............[NO] ...............[NO].......[NO] ....... [OKAY].......[NO] + [OKAY] ....... +[OKAY] +[OKAY] +fused_adam ............. fused_adam fused_adam[NO]fused_adam ............. .................... .............[NO] [NO][OKAY] [NO] +.............. [OKAY]fused_lamb.......[OKAY] + [OKAY] +............. + [NO]fused_lamb fused_lamb .......fused_lamb ............. ............. [OKAY]............. [NO] + [NO] [NO] ....... ....... ....... [OKAY][OKAY][OKAY] + + +sparse_attn ............ [NO] ....... [OKAY] +transformersparse_attnsparse_attnsparse_attn ............ ........................ ............ [NO][NO][NO] [NO] ..................... .......[OKAY] [OKAY] [OKAY] + +[OKAY] + +stochastic_transformertransformertransformer transformer ............ ............. ............ [NO][NO] [NO].......[NO] [OKAY].............. +....... [OKAY][OKAY][OKAY] +stochastic_transformer + + . stochastic_transformerstochastic_transformer [NO] ......... [OKAY][NO][NO] + .............. [OKAY][OKAY] + +ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] +[OKAY] + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------op nameop name + op name................op name ................installed ................ ..................installed compatibleinstalledinstalled.. + .. --------------------------------------------------..compatible + +compatiblecompatible +-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + +cpu_adam ............... [NO] .......cpu_adam cpu_adam [OKAY]............... cpu_adam +............... [NO]...............[NO] .......[NO]....... [OKAY].......[OKAY]fused_adam + + [OKAY]............. +[NO] ....... [OKAY] +fused_adam fused_lambfused_adam.............fused_adam .......................................[NO] .......[NO][NO] [NO] ....... [OKAY] +..............[OKAY] + [OKAY][OKAY] +fused_lamb + ............. [NO]fused_lambfused_lamb ................................. [OKAY]sparse_attn[NO][NO] + .......................... [OKAY][NO][OKAY] + +....... [OKAY] +sparse_attntransformer ........................ [NO][NO] ..............sparse_attn sparse_attn [OKAY][OKAY] + + ........................ transformer[NO]stochastic_transformer[NO] ................... ........[NO][OKAY] +[NO].......[OKAY] transformer....... +[OKAY]............[OKAY] + +transformer[NO] stochastic_transformer................... [OKAY][NO]. + .......[NO] .......stochastic_transformer [OKAY] [OKAY] + +. [NO] stochastic_transformer....... [OKAY] +. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.--------------------------------------------------DeepSpeed C++/CUDA extension op report + + +--------------------------------------------------JIT compiled ops requires ninja + +--------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +-------------------------------------------------- + +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... [OKAY].................. + [OKAY][OKAY][OKAY] +-------------------------------------------------- + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- +op name + + op nameop name................ op name ................................................installed installedinstalled..installed ....compatible.. + compatible compatible-------------------------------------------------- +compatible + +-------------------------------------------------- +-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ............... [NO]cpu_adamcpu_adam cpu_adam ...................... ..............................[OKAY][NO] + [NO][NO]....... .............. [OKAY] [OKAY] + +[OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_adamfused_adam fused_lamb ............. fused_adam.......................... [NO] ............. [NO] .......[NO][NO] .......[OKAY].............. +[OKAY][OKAY][OKAY] + + +fused_lamb ............. [NO]fused_lamb .......fused_lamb............. [OKAY].............[NO] + [NO]sparse_attn....... ...................[OKAY] +[OKAY][NO] + ....... [OKAY]sparse_attn + ............ transformer[NO] ................... [OKAY][NO] + sparse_attn....... ............transformersparse_attn[OKAY] + [NO]........................ ....... stochastic_transformer[NO][NO] [OKAY]............... + [OKAY][OKAY] +[NO]transformer + .......stochastic_transformer............transformer [OKAY][NO]............. + .......[NO] [NO] [OKAY] ....... +....... [OKAY][OKAY] + +stochastic_transformer .stochastic_transformer [NO] ....... .[OKAY] +[NO] ....... [OKAY] +ninjaninja .................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +op nameop name ................................ installedinstalled .... compatiblecompatible + +ninjaninja---------------------------------------------------------------------------------------------------- + +.................................... [OKAY][OKAY] + +---------------------------------------------------------------------------------------------------- + +cpu_adamcpu_adamop name op name ............................... ...............................[NO] installed [NO] .......installed......... ..[OKAY][OKAY] +compatible +compatible + +---------------------------------------------------------------------------------------------------- + +fused_adam fused_adam............. cpu_adamcpu_adam ............. [NO] .............................. [NO] .......[NO] [NO] .............. [OKAY] .......[OKAY] + +[OKAY] +[OKAY] +fused_lambfused_lamb .......................... [NO][NO] .............. [OKAY][OKAY] +fused_adam +fused_adam .......................... [NO][NO] .............. [OKAY] +[OKAY] +sparse_attn fused_lambsparse_attn............fused_lamb ............[NO] ............. .................... [NO] [OKAY][NO][NO] + ..................... [OKAY]transformer[OKAY][OKAY] + + +............transformer ............[NO] [NO]....... .......[OKAY] +[OKAY] +stochastic_transformer stochastic_transformersparse_attn. sparse_attn ............[NO]............. .......[NO][NO][NO] ..............[OKAY]....... +[OKAY][OKAY][OKAY] + + +transformertransformer ........................ [NO][NO] .............. [OKAY][OKAY] + +stochastic_transformer stochastic_transformer . [NO]. .......[NO] [OKAY]....... + [OKAY] +ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] +[OKAY][OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +op nameop name +op name op name ................ ................................................ installedinstalledinstalledinstalled .... .. .. compatible compatible +compatiblecompatible +-------------------------------------------------- + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja +-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system +cpu_adam cpu_adam...............cpu_adamcpu_adam [NO]............................................. [NO] [NO] ....... [NO][OKAY] ....... +.............. [OKAY][OKAY][OKAY] + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + + + +fused_adam ............. [NO] ....... fused_adam[OKAY]fused_adamfused_adam +....................................... [NO] fused_lamb[NO] [NO] ....... ....... ....... ............. [OKAY][OKAY] [OKAY] + + +[NO] ....... [OKAY]fused_lambfused_lamb +fused_lamb ....................................... [NO][NO][NO] ..................... sparse_attn [OKAY][OKAY][OKAY] + + +............ [NO] ....... [OKAY] +transformer ............ [NO] .......sparse_attn sparse_attnsparse_attn [OKAY] ........................ +............ [NO][NO]stochastic_transformer[NO] ..................... . [OKAY][OKAY] [OKAY] + +[NO] + transformer.......transformer ............[OKAY] transformer ............ +[NO] ............[NO]....... [OKAY].......[NO] + [OKAY]....... + [OKAY]stochastic_transformer stochastic_transformer + . .stochastic_transformer[NO] .......[NO] . [OKAY] +....... [NO][OKAY] +....... [OKAY] +ninjaninjaninja ninja .................................... .................. ..................[OKAY] +[OKAY][OKAY][OKAY]-------------------------------------------------- + + + +----------------------------------------------------------------------------------------------------op name + + op name--------------------------------------------------................op name + ................ installed op name................ installed ................ .. ..installed compatible + --------------------------------------------------installedcompatible + +....-------------------------------------------------- +compatiblecompatiblecpu_adam + + --------------------------------------------------...............-------------------------------------------------- + +[NO] ....... [OKAY]cpu_adam + ............... cpu_adamcpu_adam .............................. [NO][NO] [NO] .......fused_adam ....... ....................[OKAY] [OKAY][NO] + + [OKAY]....... +[OKAY] +fused_lamb fused_adam............. [NO].............fused_adam fused_adam ....... [NO]............. ............. [OKAY] ....... +[NO] [NO] [OKAY] ....... +....... [OKAY][OKAY]fused_lamb + + ............. fused_lamb[NO]sparse_attnfused_lamb ............. ................................[NO] [NO][OKAY][NO] + ....... ....... ....... [OKAY] +[OKAY] +[OKAY]transformer + ............ [NO] .......sparse_attn [OKAY]............ +sparse_attn [NO]............ stochastic_transformer ....... sparse_attn[NO] . ............[OKAY] ....... [NO] + [OKAY][NO]....... + transformer.......[OKAY] + transformer............[OKAY] ............[NO] [NO]....... + .......[OKAY] +[OKAY]transformer + ............stochastic_transformer stochastic_transformer . .[NO][NO] .......[NO]....... .......[OKAY] +[OKAY][OKAY] + +stochastic_transformer . [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja + + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +--------------------------------------------------JIT compiled ops requires ninja +--------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +JIT compiled ops requires ninja +JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report + +DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY] [OKAY] +[OKAY] + +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +op nameop nameop name ................op name................................ installed................installedinstalled installed...... compatiblecompatiblecompatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +ninjaninjaninjaninja .................................... .................. [OKAY].................. [OKAY] + [OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- +.. compatible +cpu_adam--------------------------------------------------cpu_adam +-------------------------------------------------- +op name +--------------------------------------------------op name op name +cpu_adam ............................................. [NO][NO][NO] ..................... [OKAY][OKAY]cpu_adam[OKAY] + + + ................ op name................ ................ installedinstalled................ installed .. ..installed .. compatiblecompatible .. + +............... [NO] ....... [OKAY] +compatible-------------------------------------------------- + -------------------------------------------------- +--------------------------------------------------compatible + + +fused_adamfused_adam fused_adam ............. ............. ............. [NO][NO][NO] ..................... [OKAY][OKAY][OKAY] + + +-------------------------------------------------- +fused_lambfused_adam fused_lamb............. ............. fused_lamb[NO] ............. [NO] ............. [NO]....... ....... [NO].......[OKAY][OKAY] + +cpu_adam cpu_adam...............cpu_adam [NO]...............cpu_adam ...................... [NO]...............[NO][OKAY] + .......[NO]....... [OKAY].......[OKAY] + +.......[OKAY] +[OKAY] +[OKAY] +fused_adam ............. [NO] ....... [OKAY] +fused_lamb ............. [NO] ....... sparse_attnsparse_attn[OKAY] +fused_adamfused_lambfused_adam fused_adam ............. .......................... .............[NO][NO] [NO][NO]....... ....... .............. [OKAY] [OKAY] + + ............ ............sparse_attn [NO]............ .......[NO][NO] [OKAY].............. + [OKAY][OKAY] + +[OKAY][OKAY] + +transformer ............transformer transformer [NO] ............ ............ ....... [NO] [NO] [OKAY] ....... + .......sparse_attn[OKAY] +[OKAY]stochastic_transformer............ +fused_lamb ............. fused_lambfused_lamb[NO] ................................. [OKAY][NO]sparse_attn[NO] + stochastic_transformer[NO]. stochastic_transformer....... .[NO] [OKAY][NO]........ + .......[OKAY][NO] + .......................... [NO][OKAY][OKAY] + +....... [OKAY] +transformer [OKAY]....... + [OKAY] +............ [NO] ....... [OKAY] +transformersparse_attn ........................ [NO] [NO].......sparse_attn sparse_attn [OKAY]................... +stochastic_transformer . [NO] ....... [OKAY] +............[OKAY][NO] + stochastic_transformer[NO]....... .......transformer[OKAY] +. ............[NO][OKAY]transformer + [NO]...................transformer ....... [NO][OKAY] + ............ [OKAY] ....... +[NO] [OKAY]....... + [OKAY] +stochastic_transformer . stochastic_transformer[NO]stochastic_transformer . ....... .[NO][OKAY] +[NO]....... .......[OKAY] +[OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + +JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja + + +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +-------------------------------------------------- +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja +DeepSpeed C++/CUDA extension op report + + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja .................. ......................................................[OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +[OKAY][OKAY]--------------------------------------------------[OKAY] + + + +op name---------------------------------------------------------------------------------------------------- -------------------------------------------------- + +................ +op name op nameop name installed ................ ................................ installed.. installedinstalled ..compatible.. + .. compatible --------------------------------------------------compatible +compatible + + +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... [NO]cpu_adam ...................... cpu_adam[OKAY] cpu_adam [NO] +............... ......................[NO] [NO].......[OKAY] fused_adam +[OKAY] +.................... [OKAY][NO] + ....... [OKAY] +fused_lambfused_adam fused_adam ..........................fused_adam............. [NO] [NO][NO] ............. ....... ..............[OKAY] [NO] + [OKAY][OKAY]....... + + fused_lamb [OKAY]fused_lamb............. + .............[NO] [NO]....... fused_lamb ....... [OKAY][OKAY]............. +sparse_attn + [NO]............ .......[NO] [OKAY]....... + [OKAY] +transformersparse_attn sparse_attn........................ ............sparse_attn[NO] [NO] [NO]................... .............. [OKAY][NO][OKAY][OKAY] + + +.......transformerstochastic_transformer transformer [OKAY]......................... [NO] + [NO] transformer .......[NO]....... ............[OKAY].......[OKAY] + + [NO][OKAY] +stochastic_transformer....... stochastic_transformer[OKAY] . +. [NO] stochastic_transformer [NO] ....... .......[OKAY] . +[OKAY] +[NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +---------------------------------------------------------------------------------------------------- +JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report + + +---------------------------------------------------------------------------------------------------- +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +op nameop name +op name ................................op name................ installed ................ installedinstalled .. installed .... compatible compatible.. +compatible + +----------------------------------------------------------------------------------------------------compatible-------------------------------------------------- + + + +-------------------------------------------------- +cpu_adamcpu_adam cpu_adam .............................. [NO][NO] cpu_adam.............. ............... [OKAY][OKAY]............... +[NO] + .......[NO] [OKAY] ....... +[OKAY] +fused_adamfused_adam .......................... [NO][NO] .............. [OKAY][OKAY] +fused_adam + .............fused_lambfused_lamb fused_adam ............. [NO]............. .............[NO] ....... [NO] ....... [OKAY] [NO].......[OKAY] +[OKAY]....... + + [OKAY] +fused_lamb ............. [NO] fused_lamb .................... sparse_attnsparse_attn [NO] [OKAY]............ ............ + .......[NO][NO] ..............[OKAY] [OKAY] +[OKAY] + +transformer transformer............ ............[NO] [NO] .............. [OKAY][OKAY]sparse_attn + + ............ [NO] stochastic_transformerstochastic_transformer....... sparse_attn [OKAY] .. +............ [NO][NO] [NO] ....... .......transformer ....... [OKAY][OKAY]............ + + [OKAY][NO] + .......transformer [OKAY] +............ [NO] ....... stochastic_transformer [OKAY] +. [NO] ....... [OKAY]stochastic_transformer + . [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +ninjaninjaninjaninja .................. .................. [OKAY].................................... + [OKAY][OKAY]--------------------------------------------------[OKAY] + + + +--------------------------------------------------op name-------------------------------------------------- -------------------------------------------------- + +................ +op nameop name op nameinstalled ................ ................ ..installed ................ installedcompatibleinstalled +.. --------------------------------------------------....compatible + + compatible-------------------------------------------------- +compatible +-------------------------------------------------- + +-------------------------------------------------- +cpu_adam ............... [NO] cpu_adam....... cpu_adam cpu_adam............... ...............[OKAY][NO]............... + [NO] ....... [NO] .......[OKAY]....... + [OKAY][OKAY] +fused_adam + ............. [NO] ....... [OKAY] +fused_adam .............fused_lamb fused_adamfused_adam [NO] .......................... [NO].................... [NO] .......[NO]....... [OKAY][OKAY] + +[OKAY]....... + fused_lamb[OKAY] +.............fused_lamb [NO]............. fused_lamb ....... [NO] ............. [OKAY]sparse_attn ....... + [NO]............ .......[NO][OKAY] +[OKAY]....... + [OKAY] +sparse_attn transformer............ ............[NO] [NO]....... [OKAY]....... + sparse_attn[OKAY] +transformersparse_attn ........................ stochastic_transformer ............[NO][NO] ........[NO]....... [NO] .......[OKAY].......[OKAY] + +[OKAY][OKAY] + +transformerstochastic_transformertransformer ........................ . [NO] [NO][NO]....... ..............[OKAY] [OKAY] + +[OKAY] +stochastic_transformerstochastic_transformer .. [NO] [NO]....... .......[OKAY] +[OKAY] +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... ..................[OKAY][OKAY][OKAY] + + +[OKAY]---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op nameop nameop name + ................................op name................ installed installed installed.................. installed .. ....compatiblecompatible +compatible +compatible-------------------------------------------------- +-------------------------------------------------- +-------------------------------------------------- + + +-------------------------------------------------- +cpu_adam cpu_adamcpu_adam............... cpu_adam.............................. [NO] ............... [NO][NO] [NO] .............. ....... [OKAY] [OKAY].......[OKAY] + + +[OKAY] +fused_adamfused_adamfused_adam ....................................... [NO][NO][NO] fused_adam ....... .............. ............. [OKAY][OKAY][OKAY] +[NO] + +fused_lambfused_lamb fused_lamb .................... ............. ............. [NO][OKAY] [NO] ....... [OKAY] +[NO] + .............. fused_lamb [OKAY].............[OKAY] + +[NO] ....... [OKAY] +sparse_attn ............ [NO] ....... sparse_attn[OKAY] +sparse_attn............ ............transformer[NO] [NO]................... .......[NO][OKAY]sparse_attn [OKAY] +....... +............ transformertransformer [OKAY] [NO]............ ............ + ....... [NO] [NO] [OKAY]stochastic_transformer....... + .......[OKAY]. +[OKAY][NO]transformer + ...................stochastic_transformer stochastic_transformer [OKAY] +.[NO]. [NO].......[NO] ..............[OKAY] + [OKAY] +[OKAY] +stochastic_transformer . [NO] ....... [OKAY] +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + + +---------------------------------------------------------------------------------------------------- +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + + +---------------------------------------------------------------------------------------------------- + +JIT compiled ops requires ninjaJIT compiled ops requires ninja + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +---------------------------------------------------------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +JIT compiled ops requires ninja-------------------------------------------------- + +JIT compiled ops requires ninja +ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- +op name +op name + op name................................ op name ................installed installed ....................installed compatible..installedcompatible + + --------------------------------------------------compatible..-------------------------------------------------- + + + compatible-------------------------------------------------- + +-------------------------------------------------- +cpu_adamcpu_adam .............................. [NO]cpu_adam[NO]cpu_adam ............................................ [NO] [OKAY][OKAY][NO] + + .............. [OKAY][OKAY] + +fused_adamfused_adam ..........................fused_adam fused_adam [NO] [NO]............. .................... ....... [NO][OKAY][NO][OKAY] + + .......fused_lamb....... fused_lamb[OKAY] ............. [OKAY]............. + + [NO] [NO]fused_lambfused_lamb....... .................................[OKAY] [OKAY] [NO] +[NO] + .............. [OKAY][OKAY] + +sparse_attn ............ sparse_attn[NO] ................... sparse_attnsparse_attn[OKAY][NO] + ........................ .......transformer [NO] [NO] ............ [OKAY]....... +.......[NO] transformer[OKAY][OKAY]....... + +............[OKAY] transformer +transformer[NO] ...............................stochastic_transformer . [NO][OKAY][NO][NO] + ..................... stochastic_transformer [OKAY][OKAY][OKAY] + + +. [NO]stochastic_transformer stochastic_transformer ....... .[OKAY]. + [NO][NO] .............. [OKAY][OKAY] + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +-------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +JIT compiled ops requires ninja-------------------------------------------------- + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninja ninja...................................................... [OKAY]..................[OKAY][OKAY] + + +[OKAY]---------------------------------------------------------------------------------------------------- +-------------------------------------------------- + + +--------------------------------------------------op nameop name +op name op name................................ ................ ................ installed installedinstalled installed ...... compatiblecompatiblecompatible + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +cpu_adam ................. +cpu_adam [NO]............... .......[NO] [OKAY]....... + cpu_adam[OKAY] compatible +............... + fused_adam[NO]--------------------------------------------------fused_adam ................................. +[OKAY][NO] + [NO]....... .......[OKAY] +[OKAY] +fused_lambcpu_adam .............fused_lamb [NO]fused_adam............. ....................[NO] [OKAY]......................[NO] + [NO][OKAY]....... + [OKAY] + .......fused_lamb ............. [NO] [OKAY].......sparse_attn sparse_attn [OKAY] ............ + +............ [NO][NO] .............. [OKAY][OKAY] + +transformertransformer ........................ sparse_attn[NO][NO] .......................... [NO][OKAY][OKAY] + +....... [OKAY] +stochastic_transformer stochastic_transformertransformer. ............[NO] .[NO]....... [NO].......[OKAY] +.......[OKAY] +[OKAY] +fused_adam .............stochastic_transformer . [NO] ....... [OKAY] +[NO] ....... [OKAY] +fused_lamb ............. [NO] ....... [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +transformer ............ [NO] ....... [OKAY] +stochastic_transformer . [NO] ....... [OKAY] +ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] + +[OKAY][OKAY]---------------------------------------------------------------------------------------------------- + + + +---------------------------------------------------------------------------------------------------- +op nameop name + op name op name ................ installed................................ ..................installed installed compatible..installed + --------------------------------------------------..compatible.. + + compatible--------------------------------------------------compatible + + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... cpu_adam[NO] ...............cpu_adam.......cpu_adam [NO] ...............[OKAY] +............... ....... [NO] [NO] [OKAY] ....... +....... [OKAY][OKAY] + +fused_adam ............. [NO] ....... [OKAY]fused_adam + .............fused_adam fused_adam [NO]fused_lamb ............. .................... ............. [NO] [NO][OKAY][NO] + ..................... [OKAY][OKAY][OKAY] + + +fused_lamb ............. fused_lamb[NO]fused_lamb ................................. [OKAY][NO] +[NO] sparse_attn.............. ............[OKAY][OKAY] + +[NO] ....... [OKAY] +transformer ............ sparse_attn[NO] sparse_attn ....... sparse_attn........................[OKAY] +[NO]............ [NO] ....... [NO]....... stochastic_transformer [OKAY]....... + [OKAY]. +[OKAY] transformer +[NO] transformer............ transformer ................... [NO] [OKAY]............ [NO]....... + [NO]....... [OKAY] ....... +[OKAY] +[OKAY] +stochastic_transformer stochastic_transformer. stochastic_transformer .[NO] [NO]........ ....... [OKAY] [NO] +[OKAY] +....... [OKAY] +ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] +[OKAY] +[OKAY] +-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- +--------------------------------------------------op name + +op name................op name op name ................................ installed ................ installedinstalled.. installed..compatible.. + ..compatible-------------------------------------------------- compatible + +compatible +-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... [NO]cpu_adam cpu_adam.......cpu_adam............... [OKAY]...............[NO] +............... [NO].......[NO] .......[OKAY]....... + [OKAY][OKAY] + +fused_adam ............. [NO] ....... [OKAY] +fused_adam .............fused_adam fused_lamb[NO]fused_adam .............................................. [NO][NO][OKAY][NO] + ..................... [OKAY][OKAY][OKAY] +fused_lamb + + ............. [NO] ....... fused_lambfused_lamb[OKAY] +.......................... sparse_attn[NO][NO] ................... ....... [NO] [OKAY] [OKAY] +....... + [OKAY] +sparse_attn transformer............ ............[NO] [NO].......sparse_attn sparse_attn [OKAY]....... ............ +............[OKAY] +[NO]transformer[NO] ..........................stochastic_transformer [OKAY][OKAY][NO] +. + .......transformertransformer[NO] ............[OKAY]................... + [NO][NO][OKAY] +.............. [OKAY][OKAY]stochastic_transformer + + . stochastic_transformer[NO]stochastic_transformer ........ .[OKAY] +[NO][NO] .............. [OKAY][OKAY] + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io async_io............... ............... [NO] [NO]....... .......[NO] +[NO] +transformer_inference ..transformer_inference [NO] ....... [OKAY] +.. [NO] .......utils .................. [NO] [OKAY]....... + [OKAY] +quantizer utils.............. ..................[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------ + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +-------------------------------------------------- + +JIT compiled ops requires ninja +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.JIT compiled ops requires ninja + +JIT compiled ops requires ninja + +-------------------------------------------------- +JIT compiled ops requires ninja + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] + +[OKAY][OKAY]-------------------------------------------------- + +---------------------------------------------------------------------------------------------------- + + +--------------------------------------------------op nameop name +op name op name ................ ................ ................................ installed installed installed.. ..installed compatible..compatible + + ..---------------------------------------------------------------------------------------------------- + +compatiblecompatible + +---------------------------------------------------------------------------------------------------- + +cpu_adam ............... [NO]cpu_adam ...............cpu_adam cpu_adam....... [NO] ......................[OKAY]............... + [NO][OKAY][NO] + .............. [OKAY][OKAY] + +fused_adam .............fused_adam .............fused_adamfused_adam[NO] [NO] ............. ............. ....... [OKAY][NO]....... +[NO] .............. fused_lamb[OKAY] +[OKAY] [OKAY] +.............fused_lamb + .............[NO] fused_lamb.......fused_lamb[NO] ..........................[OKAY] ....... +[NO] [NO]....... [OKAY].......[OKAY] + + [OKAY] +sparse_attn ............ [NO] ....... [OKAY] +sparse_attntransformersparse_attn sparse_attn ........................ [NO]............ ...................[NO] [NO] [OKAY][NO]....... ....... + [OKAY].......[OKAY] + +[OKAY]stochastic_transformer + transformer transformer............transformer. ............[NO]............[NO] [NO].......[NO] ..............[OKAY] + ....... [OKAY] [OKAY] + +[OKAY] +stochastic_transformerstochastic_transformer stochastic_transformer . .. [NO] [NO] .............. [NO] [OKAY] [OKAY] +....... + [OKAY] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +DeepSpeed general environment info: +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja-------------------------------------------------- + +--------------------------------------------------DeepSpeed C++/CUDA extension op report + +-------------------------------------------------- +DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +DeepSpeed C++/CUDA extension op report-------------------------------------------------- + +--------------------------------------------------JIT compiled ops requires ninja + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................. ....................................[OKAY] + [OKAY][OKAY][OKAY]-------------------------------------------------- + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +op name + + op nameop name op name................ ................ ................ installed................ installed ..installedinstalled compatible...... + --------------------------------------------------compatiblecompatiblecompatible + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + + +cpu_adam ............... [NO] cpu_adamcpu_adamcpu_adam....... ..............................[OKAY] ............... +[NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] +[OKAY] + +fused_adam ............. [NO] ....... [OKAY] +fused_adamfused_adamfused_adamfused_lamb .................................................... [NO][NO][NO][NO] ....... ....... ....... .......[OKAY] [OKAY] + [OKAY] +[OKAY] +fused_lamb + fused_lamb............. fused_lamb[NO]............. ....................[NO] [OKAY]sparse_attn.......[NO] + [OKAY]................... + [NO][OKAY] ....... + [OKAY] +sparse_attn ............transformer sparse_attn[NO]............ ................... [NO]sparse_attn [OKAY][NO] +....... ............transformer[OKAY]....... + [NO][OKAY]............ + .......[NO]stochastic_transformer transformer [OKAY] ....... +. ............[OKAY]transformer [NO] + [NO]................... .......stochastic_transformer[OKAY][NO] + [OKAY]. +....... [NO] [OKAY]stochastic_transformer....... + [OKAY]stochastic_transformer +. [NO]. .......[NO] [OKAY]....... + [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + + + +------------------------------------------------------------------------------------------------------------------------------------------------------ +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.-------------------------------------------------- + + + +------------------------------------------------------------------------------------------------------------------------------------------------------JIT compiled ops requires ninja + + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja +JIT compiled ops requires ninja + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninjaninjaninja ninja ...................................................... ..................[OKAY][OKAY][OKAY] + + +[OKAY]-------------------------------------------------- +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- +op name + + op nameop name................ op name ................installed................ .................. installedinstalled installedcompatible.... + --------------------------------------------------compatible..compatible + + +--------------------------------------------------compatible-------------------------------------------------- + + +-------------------------------------------------- +cpu_adam ............... [NO]cpu_adam cpu_adam...................... cpu_adam [NO]............... [OKAY] + ......................[NO] [OKAY] [NO] +....... ....... [OKAY]fused_adam[OKAY] + +............. [NO] ....... [OKAY]fused_adam + ............. [NO] fused_lamb.......fused_adam fused_adam .............[OKAY] +..........................[NO] fused_lamb[NO]....... [NO] .................... [OKAY] + [NO].......[OKAY] +.......[OKAY] +[OKAY]fused_lamb + .............fused_lamb [NO]............. sparse_attn....... [NO]............[OKAY] +[NO]....... sparse_attn ...................[OKAY] +[NO][OKAY] +....... [OKAY] +transformersparse_attn transformer............ ............[NO] ............sparse_attn ....... [NO] [NO] ............ [OKAY]....... ....... +[NO][OKAY] + stochastic_transformer[OKAY]....... +stochastic_transformer [OKAY].transformer +. [NO]............[NO] transformer .............. [NO] ............[OKAY] [OKAY] + ....... +[NO] [OKAY]....... + [OKAY] +stochastic_transformer . stochastic_transformer[NO] ........ [NO][OKAY] +....... [OKAY] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +------------------------------------------------------------------------------------------------------------------------------------------------------ + +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- + +----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- + +-------------------------------------------------- +--------------------------------------------------JIT compiled ops requires ninja + +DeepSpeed C++/CUDA extension op report +JIT compiled ops requires ninja +JIT compiled ops requires ninja + +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +ninjaninjaninja ninja .................................... .................. .................. [OKAY] [OKAY][OKAY] +[OKAY] + + +------------------------------------------------------------------------------------------------------------------------------------------------------ + +-------------------------------------------------- +op nameop name +op name op name ................................ ................ ................installedinstalledinstalled installed.... .. compatible..compatible + +compatiblecompatible-------------------------------------------------- +-------------------------------------------------- + + +---------------------------------------------------------------------------------------------------- + +cpu_adam cpu_adam............... cpu_adamcpu_adam...............[NO] ...............[NO]...................... .......[NO][OKAY][NO] + [OKAY].............. + [OKAY][OKAY] + +fused_adam ............. fused_adam[NO] .................... [NO]fused_adamfused_adam[OKAY] +................................. [OKAY][NO]fused_lamb + [NO] ....... ............. fused_lamb[OKAY]....... + [NO].............[OKAY] +fused_lamb[NO]....... .............fused_lamb[OKAY]....... +[NO][OKAY]............. + .......[NO] [OKAY] +....... [OKAY] +sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]sparse_attn....... + sparse_attn............[OKAY]transformer + [NO]............ ............ ....... transformer[NO][NO][OKAY] + .......................... transformer [NO] [OKAY][OKAY] ............ + + .......[NO] transformer.......[OKAY]stochastic_transformer +[OKAY]............. + stochastic_transformer[NO][NO] ............... stochastic_transformer[OKAY] [OKAY] [NO] + + ........ stochastic_transformer[NO][OKAY] +........ [OKAY][NO] + ....... [OKAY] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_io async_io............... [NO]............... .......[NO] [NO]....... + [NO] +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utilsutils .................................... [NO][NO] .............. [OKAY][OKAY] + +quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] +[OKAY] +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.1 +11.1nvcc version + nvcc version..................... .....................11.2 +11.2deepspeed install path + deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed info + deepspeed info................... ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix0.5.5+57dee5a, 57dee5a, pp_deadlock_fix + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... transformer_inference[NO] ......... [NO][NO] +....... [OKAY] +async_ioutils .................. ...............[NO] transformer_inference ....... [NO] .. [OKAY] ....... +[NO] [NO]....... + [OKAY] +quantizer .............. [NO] utils....... ..................[OKAY] +[NO]transformer_inference ......... --------------------------------------------------[OKAY][NO] + + ....... [OKAY] +quantizer .............. utils[NO] ......................... [NO][OKAY] +....... [OKAY] +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] .......  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[OKAY] + +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +transformer_inference utils.. ..................[NO] [NO]....... .......[OKAY] +[OKAY] +quantizerutils ................................ [NO][NO] .............. [OKAY][OKAY] + +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils .................. [NO]utils ......................... [OKAY][NO] + ....... quantizer[OKAY] + .............. [NO] .......quantizer [OKAY].............. + [NO] ....... [OKAY]-------------------------------------------------- + +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +DeepSpeed general environment info:nvcc version ..................... +11.2 +deepspeed install path torch install path........... ...............['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +deepspeed wheel compiled w. torch version...... ....................torch 1.8, cuda 11.1 +1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +utils .................. [NO] ....... [OKAY] +async_ioquantizer ............................. [NO][NO] .............. [OKAY][NO] + +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] +[NO] +transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] +[OKAY] +utils ..................utils [NO] ......................... [NO][OKAY] +....... [OKAY] +quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... + [OKAY] +-------------------------------------------------- +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] + [WARNING]  async_io: please install the libaio-devel package with yum +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.transformer_inference .. + [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +async_io ............... [NO]quantizer ..................... [NO] ....... [NO][OKAY] + +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] .......  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[OKAY] + +utils .................. [NO] ....... [OKAY] +async_ioquantizer ............................. [NO][NO] .............. [OKAY][NO] + +-------------------------------------------------- +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix0.5.5+57dee5a, 57dee5a, pp_deadlock_fix + +deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 +torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +transformer_inference .. [NO] ....... [OKAY] +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +quantizer .............. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +-------------------------------------------------- +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda versiontorch cuda version .............................. 11.111.1 + +nvcc versionnvcc version .......................................... 11.211.2 + +deepspeed install pathdeepspeed install path ...................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed infodeepspeed info ...................................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix0.5.5+57dee5a, 57dee5a, pp_deadlock_fix + +deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 + +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ...............DeepSpeed general environment info: 11.1 +nvcc version +..................... 11.2 +deepspeed install path torch install path........... ............... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']0.5.5+57dee5a, 57dee5a, pp_deadlock_fix + +deepspeed wheel compiled w.torch version ...... ....................torch 1.8, cuda 11.1 +1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix + [WARNING]  async_io: please install the libaio-devel package with yum +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']torch version + .................... torch version1.8.1 +.................... torch cuda version1.8.1 +............... torch cuda version11.1 +...............nvcc version 11.1..................... + nvcc version11.2 +.....................deepspeed install path 11.2........... + deepspeed install path ...........['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']................... + 0.5.5+57dee5a, 57dee5a, pp_deadlock_fixdeepspeed info + deepspeed wheel compiled w.................... ......0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +torch 1.8, cuda 11.1 +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info:DeepSpeed general environment info: + +torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch versiontorch version ........................................ 1.8.11.8.1 + +torch cuda version torch cuda version............... ...............11.1 +11.1nvcc version + nvcc version..................... .....................11.2 +11.2 +deepspeed install path deepspeed install path........... ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed info + deepspeed info................... ...................0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +0.5.5+57dee5a, 57dee5a, pp_deadlock_fixdeepspeed wheel compiled w. +......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... + torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: + +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + +async_ioasync_io .............................. [NO][NO] .............. [NO][NO] + +transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] + +utils utils.................. ..................[NO] [NO]....... .......[OKAY] +[OKAY] +quantizer .............. [NO]quantizer ..................... [NO][OKAY] +....... [OKAY] +-------------------------------------------------- +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum + + + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + + +async_ioasync_ioasync_io ............... ............... ............... [NO] [NO][NO] ..................... [NO][NO] +[NO] + +transformer_inferencetransformer_inference transformer_inference.... ..[NO][NO] [NO].............. .......[OKAY][OKAY] + +[OKAY] +utilsutils ....................................utils [NO][NO].................. ..............[NO] [OKAY][OKAY]....... + +[OKAY] +quantizerquantizer ..............quantizer.............. [NO]..............[NO] ....... [NO] ....... [OKAY] ....... +[OKAY] +[OKAY] +-------------------------------------------------- +---------------------------------------------------------------------------------------------------- + + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +transformer_inference .. [NO] ....... [OKAY] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +-------------------------------------------------- +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2DeepSpeed general environment info: +deepspeed install path +........... torch install path['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + deepspeed info............... ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w.['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +...... torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info: +torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +.................... torch version1.8.1 +.................... torch cuda version1.8.1 +............... 11.1torch cuda version + ...............nvcc version .....................11.1 +11.2nvcc version + deepspeed install path..................... ...........11.2 +deepspeed install path ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']........... + deepspeed info ................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']0.5.5+57dee5a, 57dee5a, pp_deadlock_fix + +deepspeed infodeepspeed wheel compiled w. ......................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fixtorch 1.8, cuda 11.1 + +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path +............... torch install path ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']............... + torch version .................... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch']1.8.1 + +torch cuda versiontorch version ................................... 11.11.8.1 + +nvcc version .....................torch cuda version 11.2............... + deepspeed install path11.1 +...........nvcc version ..................... 11.2['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] + +deepspeed install pathdeepspeed info .............................. 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed wheel compiled w. + deepspeed info...... ...................torch 1.8, cuda 11.1 +0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +DeepSpeed general environment info:torch install path +............... torch install path['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +............... torch version .................... 1.8.1['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] + +torch cuda version torch version............... ....................11.1 +1.8.1 +nvcc version .....................torch cuda version 11.2............... + deepspeed install path11.1 +...........nvcc version ..................... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']11.2 + +deepspeed install pathdeepspeed info .............................. 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed']deepspeed wheel compiled w. + ......deepspeed info torch 1.8, cuda 11.1................... + 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +using world size: 128, data-parallel-size: 1, tensor-model-parallel size: 4, pipeline-model-parallel size: 32 +using torch.float16 for parameters ... +------------------------ arguments ------------------------ + accumulate_allreduce_grads_in_fp32 .............. False + adam_beta1 ...................................... 0.9 + adam_beta2 ...................................... 0.95 + adam_eps ........................................ 1e-08 + adlr_autoresume ................................. False + adlr_autoresume_interval ........................ 1000 + apply_query_key_layer_scaling ................... True + apply_residual_connection_post_layernorm ........ False + attention_dropout ............................... 0.1 + attention_softmax_in_fp32 ....................... False + bert_binary_head ................................ True + bert_load ....................................... None + bf16 ............................................ False + bias_dropout_fusion ............................. True + bias_gelu_fusion ................................ True + biencoder_projection_dim ........................ 0 + biencoder_shared_query_context_model ............ False + block_data_path ................................. None + checkpoint_activations .......................... True + checkpoint_in_cpu ............................... False + checkpoint_num_layers ........................... 1 + clip_grad ....................................... 1.0 + codecarbon_dir .................................. None + consumed_train_samples .......................... 0 + consumed_train_tokens ........................... 0 + consumed_valid_samples .......................... 0 + contigious_checkpointing ........................ False + cpu_optimizer ................................... False + cpu_torch_adam .................................. False + curriculum_learning ............................. False + data_impl ....................................... mmap + data_parallel_size .............................. 1 + data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] + dataloader_type ................................. single + DDP_impl ........................................ local + decoder_seq_length .............................. None + deepscale ....................................... False + deepscale_config ................................ None + deepspeed ....................................... True + deepspeed_activation_checkpointing .............. True + deepspeed_config ................................ ./ds_config.1645299.json + deepspeed_mpi ................................... False + distribute_checkpointed_activations ............. False + distributed_backend ............................. nccl + embedding_path .................................. None + encoder_seq_length .............................. 2048 + eod_mask_loss ................................... False + eval_interval ................................... 1000 + eval_iters ...................................... 5 + evidence_data_path .............................. None + exit_duration_in_mins ........................... 55 + exit_interval ................................... None + ffn_hidden_size ................................. 46400 + finetune ........................................ False + fp16 ............................................ True + fp16_lm_cross_entropy ........................... False + fp32_residual_connection ........................ False + gigaflos_no_embeds .............................. 0 + global_batch_size ............................... 2048 + glu_activation .................................. None + hidden_dropout .................................. 0.1 + hidden_size ..................................... 11600 + hysteresis ...................................... 2 + ict_head_size ................................... None + ict_load ........................................ None + img_dim ......................................... 224 + indexer_batch_size .............................. 128 + indexer_log_interval ............................ 1000 + init_method_std ................................. 0.02 + init_method_xavier_uniform ...................... False + initial_loss_scale .............................. 4294967296 + kv_channels ..................................... 145 + layernorm_epsilon ............................... 1e-05 + lazy_mpu_init ................................... None + load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + local_rank ...................................... 0 + log_batch_size_to_tensorboard ................... True + log_interval .................................... 1 + log_learning_rate_to_tensorboard ................ True + log_loss_scale_to_tensorboard ................... True + log_num_zeros_in_grad ........................... False + log_params_norm ................................. False + log_timers_to_tensorboard ....................... True + log_validation_ppl_to_tensorboard ............... True + loss_on_targets_only ............................ False + loss_scale ...................................... 12.0 + loss_scale_window ............................... 1000 + lr .............................................. 6e-05 + lr_decay_iters .................................. None + lr_decay_samples ................................ None + lr_decay_style .................................. cosine + lr_decay_tokens ................................. 260000000000 + lr_warmup_fraction .............................. None + lr_warmup_iters ................................. 0 + lr_warmup_samples ............................... 216320 + make_vocab_size_divisible_by .................... 128 + mask_prob ....................................... 0.15 + masked_softmax_fusion ........................... False + max_position_embeddings ......................... 2048 + memory_centric_tiled_linear ..................... False + merge_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-merges.txt + micro_batch_size ................................ 1 + min_loss_scale .................................. 1.0 + min_lr .......................................... 6e-06 + mmap_warmup ..................................... False + no_load_optim ................................... None + no_load_rng ..................................... None + no_save_optim ................................... None + no_save_rng ..................................... None + num_attention_heads ............................. 80 + num_channels .................................... 3 + num_classes ..................................... 1000 + num_layers ...................................... 64 + num_layers_per_virtual_pipeline_stage ........... None + num_workers ..................................... 2 + onnx_safe ....................................... None + openai_gelu ..................................... False + optimizer ....................................... adam + override_lr_scheduler ........................... False + params_dtype .................................... torch.float16 + partition_activations ........................... False + patch_dim ....................................... 16 + pipeline_model_parallel_size .................... 32 + position_embedding_type ......................... PositionEmbeddingType.absolute + profile_backward ................................ False + query_in_block_prob ............................. 0.1 + rampup_batch_size ............................... None + rank ............................................ 0 + remote_device ................................... none + reset_attention_mask ............................ False + reset_position_ids .............................. False + retriever_report_topk_accuracies ................ [] + retriever_score_scaling ......................... False + retriever_seq_length ............................ 256 + sample_rate ..................................... 1.0 + save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + save_interval ................................... 300 + scatter_gather_tensors_in_pipeline .............. True + scattered_embeddings ............................ False + seed ............................................ 43 + seq_length ...................................... 2048 + sgd_momentum .................................... 0.9 + short_seq_prob .................................. 0.1 + split ........................................... 949,50,1 + split_transformers .............................. False + synchronize_each_layer .......................... False + tensor_model_parallel_size ...................... 4 + tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/tr8b-104B-logs/tensorboard + tensorboard_log_interval ........................ 1 + tensorboard_queue_size .......................... 5 + tile_factor ..................................... 1 + titles_data_path ................................ None + tokenizer_name_or_path .......................... None + tokenizer_type .................................. GPT2BPETokenizer + train_iters ..................................... None + train_samples ................................... 600000000 + train_tokens .................................... 300000000000 + use_checkpoint_lr_scheduler ..................... False + use_contiguous_buffers_in_ddp ................... False + use_cpu_initialization .......................... None + use_one_sent_docs ............................... False + use_pin_memory .................................. False + virtual_pipeline_model_parallel_size ............ None + vocab_extra_ids ................................. 0 + vocab_file ...................................... /gpfswork/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-tr8b-104B/data/gpt2-vocab.json + weight_decay .................................... 0.1 + world_size ...................................... 128 + zero_allgather_bucket_size ...................... 0.0 + zero_contigious_gradients ....................... False + zero_reduce_bucket_size ......................... 0.0 + zero_reduce_scatter ............................. False + zero_stage ...................................... 1 +-------------------- end of arguments --------------------- +setting number of micro-batches to constant 2048 +> building GPT2BPETokenizer tokenizer ... +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main ******** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** + +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +-------------------------------------------------- +DeepSpeed C++/CUDA extension op report +-------------------------------------------------- +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +---------------------------------------------------------------------------------------------------- + +DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report + +---------------------------------------------------------------------------------------------------- + +--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. + + +----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report + + +JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- + + +NOTE: Ops not installed will be just-in-time (JIT) compiled at + runtime if needed. Op compatibility means that your system + meet the required dependencies to JIT install the op. +-------------------------------------------------- +JIT compiled ops requires ninja +ninjaninjaninjaninja .................. .................................... .................. [OKAY] + [OKAY][OKAY][OKAY]-------------------------------------------------- + + +-------------------------------------------------- +-------------------------------------------------- +op name-------------------------------------------------- + +op name................op nameop name ................................................installed installedinstalled.. installed .. ..compatiblecompatible +.. + --------------------------------------------------compatible--------------------------------------------------compatible + + + +---------------------------------------------------------------------------------------------------- + +cpu_adam cpu_adam............... ............... cpu_adam[NO]cpu_adam .......[NO]............... [OKAY] ............... +....... [NO] [NO] [OKAY] +.............. [OKAY][OKAY] + +fused_adam ............. [NO] .......fused_adam [OKAY]fused_adam.............fused_adam + .............[NO]fused_lamb ............. ....... [NO] [NO] [OKAY]............. + ....... ....... [NO] [OKAY] [OKAY] +....... +fused_lamb fused_lamb[OKAY].............fused_lamb + [NO].......................... .......[NO][NO] [OKAY] ....... + .......[OKAY] +[OKAY] +sparse_attn ............ [NO] ....... [OKAY] +sparse_attn transformersparse_attnsparse_attn ............ ............[NO]........................ [NO].......[NO] [NO] .......[OKAY] ....... +.......[OKAY] +[OKAY][OKAY]transformer + +............transformer stochastic_transformertransformer[NO]............ .................... [NO] [NO][NO] [OKAY] ....... + ..............[OKAY] +[OKAY][OKAY] + +stochastic_transformer stochastic_transformerstochastic_transformer . .[NO]. .......[NO][NO] [OKAY] ....... +....... [OKAY][OKAY] + + > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) +> initializing torch distributed ... +> setting tensorboard ... + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +transformer_inference .. [NO] ....... [OKAY] +utils .................. [NO] ....... [OKAY] +quantizer .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  async_io: please install the libaio-devel package with yum + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. + [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. +async_io ............... [NO] ....... [NO] +async_io ............... [NO] ....... transformer_inference[NO] +.. [NO] ....... [OKAY] +utils ..................transformer_inference [NO].. .......[NO] [OKAY]....... + [OKAY] +quantizer .............. [NO]utils ......................... [OKAY][NO] + ....... [OKAY] +--------------------------------------------------quantizer + .............. [NO] ....... [OKAY] +-------------------------------------------------- +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +DeepSpeed general environment info: +torch install path ............... ['/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch'] +torch version .................... 1.8.1 +torch cuda version ............... 11.1 +nvcc version ..................... 11.2 +deepspeed install path ........... ['/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed'] +deepspeed info ................... 0.5.5+57dee5a, 57dee5a, pp_deadlock_fix +deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +**** Git info for Megatron: git_hash=df7a9d9 git_branch=main **** +> initializing tensor model parallel with size 4 +> initializing pipeline model parallel with size 32 +> setting random seeds to 43 ... +[2021-10-21 21:27:53,188] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2761 and data parallel seed: 43 +> compiling dataset index builder ... +make: Entering directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/data' +make: Nothing to be done for 'default'. +make: Leaving directory '/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/data' +>>> done with dataset index builder. Compilation time: 0.313 seconds +WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations. +> compiling and loading fused kernels ... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Detected CUDA files, patching ldflags +Emitting ninja build file /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/fused_kernels/build/build.ninja... +Building extension module fused_mix_prec_layer_norm_cuda... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +ninja: no work to do. +Loading extension module fused_mix_prec_layer_norm_cuda... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +>>> done with compiling and loading fused kernels. Compilation time: 5.317 seconds +time to initialize megatron (seconds): 62.895 +[after megatron is initialized] datetime: 2021-10-21 21:27:58 +building GPT model ... +[2021-10-21 21:27:58,952] [INFO] [utils.py:806:see_memory_usage] Before Building Model +[2021-10-21 21:27:58,953] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB +[2021-10-21 21:27:58,953] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.11 GB, percent = 21.4% +SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None +Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=1, data=0, model=0): 4, ProcessCoord(pipe=1, data=0, model=1): 5, ProcessCoord(pipe=1, data=0, model=2): 6, ProcessCoord(pipe=1, data=0, model=3): 7, ProcessCoord(pipe=2, data=0, model=0): 8, ProcessCoord(pipe=2, data=0, model=1): 9, ProcessCoord(pipe=2, data=0, model=2): 10, ProcessCoord(pipe=2, data=0, model=3): 11, ProcessCoord(pipe=3, data=0, model=0): 12, ProcessCoord(pipe=3, data=0, model=1): 13, ProcessCoord(pipe=3, data=0, model=2): 14, ProcessCoord(pipe=3, data=0, model=3): 15, ProcessCoord(pipe=4, data=0, model=0): 16, ProcessCoord(pipe=4, data=0, model=1): 17, ProcessCoord(pipe=4, data=0, model=2): 18, ProcessCoord(pipe=4, data=0, model=3): 19, ProcessCoord(pipe=5, data=0, model=0): 20, ProcessCoord(pipe=5, data=0, model=1): 21, ProcessCoord(pipe=5, data=0, model=2): 22, ProcessCoord(pipe=5, data=0, model=3): 23, ProcessCoord(pipe=6, data=0, model=0): 24, ProcessCoord(pipe=6, data=0, model=1): 25, ProcessCoord(pipe=6, data=0, model=2): 26, ProcessCoord(pipe=6, data=0, model=3): 27, ProcessCoord(pipe=7, data=0, model=0): 28, ProcessCoord(pipe=7, data=0, model=1): 29, ProcessCoord(pipe=7, data=0, model=2): 30, ProcessCoord(pipe=7, data=0, model=3): 31, ProcessCoord(pipe=8, data=0, model=0): 32, ProcessCoord(pipe=8, data=0, model=1): 33, ProcessCoord(pipe=8, data=0, model=2): 34, ProcessCoord(pipe=8, data=0, model=3): 35, ProcessCoord(pipe=9, data=0, model=0): 36, ProcessCoord(pipe=9, data=0, model=1): 37, ProcessCoord(pipe=9, data=0, model=2): 38, ProcessCoord(pipe=9, data=0, model=3): 39, ProcessCoord(pipe=10, data=0, model=0): 40, ProcessCoord(pipe=10, data=0, model=1): 41, ProcessCoord(pipe=10, data=0, model=2): 42, ProcessCoord(pipe=10, data=0, model=3): 43, ProcessCoord(pipe=11, data=0, model=0): 44, ProcessCoord(pipe=11, data=0, model=1): 45, ProcessCoord(pipe=11, data=0, model=2): 46, ProcessCoord(pipe=11, data=0, model=3): 47, ProcessCoord(pipe=12, data=0, model=0): 48, ProcessCoord(pipe=12, data=0, model=1): 49, ProcessCoord(pipe=12, data=0, model=2): 50, ProcessCoord(pipe=12, data=0, model=3): 51, ProcessCoord(pipe=13, data=0, model=0): 52, ProcessCoord(pipe=13, data=0, model=1): 53, ProcessCoord(pipe=13, data=0, model=2): 54, ProcessCoord(pipe=13, data=0, model=3): 55, ProcessCoord(pipe=14, data=0, model=0): 56, ProcessCoord(pipe=14, data=0, model=1): 57, ProcessCoord(pipe=14, data=0, model=2): 58, ProcessCoord(pipe=14, data=0, model=3): 59, ProcessCoord(pipe=15, data=0, model=0): 60, ProcessCoord(pipe=15, data=0, model=1): 61, ProcessCoord(pipe=15, data=0, model=2): 62, ProcessCoord(pipe=15, data=0, model=3): 63, ProcessCoord(pipe=16, data=0, model=0): 64, ProcessCoord(pipe=16, data=0, model=1): 65, ProcessCoord(pipe=16, data=0, model=2): 66, ProcessCoord(pipe=16, data=0, model=3): 67, ProcessCoord(pipe=17, data=0, model=0): 68, ProcessCoord(pipe=17, data=0, model=1): 69, ProcessCoord(pipe=17, data=0, model=2): 70, ProcessCoord(pipe=17, data=0, model=3): 71, ProcessCoord(pipe=18, data=0, model=0): 72, ProcessCoord(pipe=18, data=0, model=1): 73, ProcessCoord(pipe=18, data=0, model=2): 74, ProcessCoord(pipe=18, data=0, model=3): 75, ProcessCoord(pipe=19, data=0, model=0): 76, ProcessCoord(pipe=19, data=0, model=1): 77, ProcessCoord(pipe=19, data=0, model=2): 78, ProcessCoord(pipe=19, data=0, model=3): 79, ProcessCoord(pipe=20, data=0, model=0): 80, ProcessCoord(pipe=20, data=0, model=1): 81, ProcessCoord(pipe=20, data=0, model=2): 82, ProcessCoord(pipe=20, data=0, model=3): 83, ProcessCoord(pipe=21, data=0, model=0): 84, ProcessCoord(pipe=21, data=0, model=1): 85, ProcessCoord(pipe=21, data=0, model=2): 86, ProcessCoord(pipe=21, data=0, model=3): 87, ProcessCoord(pipe=22, data=0, model=0): 88, ProcessCoord(pipe=22, data=0, model=1): 89, ProcessCoord(pipe=22, data=0, model=2): 90, ProcessCoord(pipe=22, data=0, model=3): 91, ProcessCoord(pipe=23, data=0, model=0): 92, ProcessCoord(pipe=23, data=0, model=1): 93, ProcessCoord(pipe=23, data=0, model=2): 94, ProcessCoord(pipe=23, data=0, model=3): 95, ProcessCoord(pipe=24, data=0, model=0): 96, ProcessCoord(pipe=24, data=0, model=1): 97, ProcessCoord(pipe=24, data=0, model=2): 98, ProcessCoord(pipe=24, data=0, model=3): 99, ProcessCoord(pipe=25, data=0, model=0): 100, ProcessCoord(pipe=25, data=0, model=1): 101, ProcessCoord(pipe=25, data=0, model=2): 102, ProcessCoord(pipe=25, data=0, model=3): 103, ProcessCoord(pipe=26, data=0, model=0): 104, ProcessCoord(pipe=26, data=0, model=1): 105, ProcessCoord(pipe=26, data=0, model=2): 106, ProcessCoord(pipe=26, data=0, model=3): 107, ProcessCoord(pipe=27, data=0, model=0): 108, ProcessCoord(pipe=27, data=0, model=1): 109, ProcessCoord(pipe=27, data=0, model=2): 110, ProcessCoord(pipe=27, data=0, model=3): 111, ProcessCoord(pipe=28, data=0, model=0): 112, ProcessCoord(pipe=28, data=0, model=1): 113, ProcessCoord(pipe=28, data=0, model=2): 114, ProcessCoord(pipe=28, data=0, model=3): 115, ProcessCoord(pipe=29, data=0, model=0): 116, ProcessCoord(pipe=29, data=0, model=1): 117, ProcessCoord(pipe=29, data=0, model=2): 118, ProcessCoord(pipe=29, data=0, model=3): 119, ProcessCoord(pipe=30, data=0, model=0): 120, ProcessCoord(pipe=30, data=0, model=1): 121, ProcessCoord(pipe=30, data=0, model=2): 122, ProcessCoord(pipe=30, data=0, model=3): 123, ProcessCoord(pipe=31, data=0, model=0): 124, ProcessCoord(pipe=31, data=0, model=1): 125, ProcessCoord(pipe=31, data=0, model=2): 126, ProcessCoord(pipe=31, data=0, model=3): 127} +[2021-10-21 21:28:00,626] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer +stage=0 layers=5 + 0: _to_float16 + 1: EmbeddingPipe + 2: + 3: ParallelTransformerLayerPipe + 4: ParallelTransformerLayerPipe +stage=1 layers=2 + 5: ParallelTransformerLayerPipe + 6: ParallelTransformerLayerPipe +stage=2 layers=2 + 7: ParallelTransformerLayerPipe + 8: ParallelTransformerLayerPipe +stage=3 layers=2 + 9: ParallelTransformerLayerPipe + 10: ParallelTransformerLayerPipe +stage=4 layers=2 + 11: ParallelTransformerLayerPipe + 12: ParallelTransformerLayerPipe +stage=5 layers=2 + 13: ParallelTransformerLayerPipe + 14: ParallelTransformerLayerPipe +stage=6 layers=2 + 15: ParallelTransformerLayerPipe + 16: ParallelTransformerLayerPipe +stage=7 layers=2 + 17: ParallelTransformerLayerPipe + 18: ParallelTransformerLayerPipe +stage=8 layers=2 + 19: ParallelTransformerLayerPipe + 20: ParallelTransformerLayerPipe +stage=9 layers=2 + 21: ParallelTransformerLayerPipe + 22: ParallelTransformerLayerPipe +stage=10 layers=2 + 23: ParallelTransformerLayerPipe + 24: ParallelTransformerLayerPipe +stage=11 layers=2 + 25: ParallelTransformerLayerPipe + 26: ParallelTransformerLayerPipe +stage=12 layers=2 + 27: ParallelTransformerLayerPipe + 28: ParallelTransformerLayerPipe +stage=13 layers=2 + 29: ParallelTransformerLayerPipe + 30: ParallelTransformerLayerPipe +stage=14 layers=2 + 31: ParallelTransformerLayerPipe + 32: ParallelTransformerLayerPipe +stage=15 layers=2 + 33: ParallelTransformerLayerPipe + 34: ParallelTransformerLayerPipe +stage=16 layers=2 + 35: ParallelTransformerLayerPipe + 36: ParallelTransformerLayerPipe +stage=17 layers=2 + 37: ParallelTransformerLayerPipe + 38: ParallelTransformerLayerPipe +stage=18 layers=2 + 39: ParallelTransformerLayerPipe + 40: ParallelTransformerLayerPipe +stage=19 layers=2 + 41: ParallelTransformerLayerPipe + 42: ParallelTransformerLayerPipe +stage=20 layers=2 + 43: ParallelTransformerLayerPipe + 44: ParallelTransformerLayerPipe +stage=21 layers=2 + 45: ParallelTransformerLayerPipe + 46: ParallelTransformerLayerPipe +stage=22 layers=2 + 47: ParallelTransformerLayerPipe + 48: ParallelTransformerLayerPipe +stage=23 layers=2 + 49: ParallelTransformerLayerPipe + 50: ParallelTransformerLayerPipe +stage=24 layers=2 + 51: ParallelTransformerLayerPipe + 52: ParallelTransformerLayerPipe +stage=25 layers=2 + 53: ParallelTransformerLayerPipe + 54: ParallelTransformerLayerPipe +stage=26 layers=2 + 55: ParallelTransformerLayerPipe + 56: ParallelTransformerLayerPipe +stage=27 layers=2 + 57: ParallelTransformerLayerPipe + 58: ParallelTransformerLayerPipe +stage=28 layers=2 + 59: ParallelTransformerLayerPipe + 60: ParallelTransformerLayerPipe +stage=29 layers=2 + 61: ParallelTransformerLayerPipe + 62: ParallelTransformerLayerPipe +stage=30 layers=2 + 63: ParallelTransformerLayerPipe + 64: ParallelTransformerLayerPipe +stage=31 layers=6 + 65: ParallelTransformerLayerPipe + 66: ParallelTransformerLayerPipe + 67: + 68: MixedFusedLayerNorm + 69: EmbeddingPipe + 70: float16_to_fp32 + loss: CrossEntropy + > number of parameters on (tensor, pipeline) model parallel rank (3, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 27): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 24): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 24): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (1, 24): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 26): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 12): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 19): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 22): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 22): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (3, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (1, 22): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 29): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 30): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 30): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 30): 807539800 + + + > number of parameters on (tensor, pipeline) model parallel rank (2, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 17): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 30): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 25): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 25): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 25): 807539800 + + + > number of parameters on (tensor, pipeline) model parallel rank (1, 25): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 7): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 7): 807539800 + + + > number of parameters on (tensor, pipeline) model parallel rank (0, 15): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 10): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 10): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 10): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 14): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 21): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (3, 21): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 21): 807539800 + + + > number of parameters on (tensor, pipeline) model parallel rank (0, 21): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 7): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 28): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 11): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 6): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (2, 6): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 6): 807539800 + + + > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 11): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (0, 11): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 8): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 8): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 8): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 4): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 23): 807539800 > number of parameters on (tensor, pipeline) model parallel rank (1, 23): 807539800 + + > number of parameters on (tensor, pipeline) model parallel rank (2, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 23): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (0, 16): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 9): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 20): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 18): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 5): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 13): 807539800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (1, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (3, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (0, 31): 978315000 + > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 978291800 + > number of parameters on (tensor, pipeline) model parallel rank (2, 31): 978315000 +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + + +[2021-10-21 21:28:01,340] [INFO] [utils.py:806:see_memory_usage] After Building Model +[2021-10-21 21:28:01,341] [INFO] [utils.py:807:see_memory_usage] MA 1.88 GB Max_MA 1.88 GB CA 1.91 GB Max_CA 2 GB +[2021-10-21 21:28:01,341] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.28 GB, percent = 21.5% + > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 978291800 +setting training iterations to 292968 +> learning rate decay style: cosine +DeepSpeed is enabled. +[2021-10-21 21:28:01,342] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+57dee5a, git-hash=57dee5a, git-branch=pp_deadlock_fix +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +[2021-10-21 21:28:01,379] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False +[2021-10-21 21:28:01,379] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer +[2021-10-21 21:28:01,379] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer +[2021-10-21 21:28:01,380] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam +[2021-10-21 21:28:01,380] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= +[2021-10-21 21:28:01,380] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer +[2021-10-21 21:28:01,380] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 +[2021-10-21 21:28:01,380] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 +[2021-10-21 21:28:01,380] [INFO] [stage2.py:113:__init__] CPU Offload: False +[2021-10-21 21:28:01,380] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: + + !! WARNING !! + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +Your compiler (c++) is not compatible with the compiler Pytorch was +built with for this platform, which is g++ on linux. Please +use g++ to to compile your extension. Alternatively, you may +compile PyTorch from source using c++, and then you can also use +c++ to compile your extension. + +See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help +with compiling PyTorch from source. +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + + !! WARNING !! + + warnings.warn(WRONG_COMPILER_WARNING.format( +Emitting ninja build file /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions/utils/build.ninja... +Building extension module utils... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/include/THC -isystem /gpfswork/rech/six/commun/conda/cutting-edge/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/DeepSpeed/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o +[2/2] c++ flatten_unflatten.o -shared -L/gpfswork/rech/six/commun/conda/cutting-edge/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so +Loading extension module utils... +Time to load utils op: 12.890349864959717 seconds +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils... + +Loading extension module utils... + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils... + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils...Loading extension module utils...Loading extension module utils... + + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils...Loading extension module utils...Loading extension module utils... + + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils... +Loading extension module utils... +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils... +Loading extension module utils... +Loading extension module utils... +Loading extension module utils... +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Loading extension module utils...Loading extension module utils...Loading extension module utils...Loading extension module utils... + + + +Time to load utils op: 12.95810055732727 secondsTime to load utils op: 12.957229137420654 secondsTime to load utils op: 12.963773250579834 seconds + + +Time to load utils op: 12.964766263961792 secondsTime to load utils op: 12.969484090805054 secondsTime to load utils op: 12.969359397888184 secondsTime to load utils op: 12.961370706558228 seconds + + + +Time to load utils op: 12.969290971755981 secondsTime to load utils op: 12.962694644927979 seconds + +Time to load utils op: 12.968739986419678 seconds +Time to load utils op: 12.961738348007202 seconds +Time to load utils op: 12.966982364654541 secondsTime to load utils op: 12.964924097061157 secondsTime to load utils op: 12.965541362762451 secondsTime to load utils op: 12.963887453079224 seconds + + + +Time to load utils op: 12.965399026870728 secondsTime to load utils op: 12.966296434402466 seconds + +Time to load utils op: 12.96236515045166 seconds +Time to load utils op: 12.965842485427856 seconds +Time to load utils op: 12.957144260406494 seconds +Time to load utils op: 12.957462310791016 secondsTime to load utils op: 12.957109928131104 seconds + +Time to load utils op: 12.954140663146973 seconds +Time to load utils op: 12.963305711746216 secondsTime to load utils op: 12.963874101638794 secondsTime to load utils op: 12.964272022247314 secondsTime to load utils op: 12.961817741394043 seconds + + + +Time to load utils op: 12.961796522140503 seconds +Time to load utils op: 12.96087646484375 secondsTime to load utils op: 12.962192296981812 seconds + +Time to load utils op: 12.961981773376465 seconds +Time to load utils op: 12.962405443191528 secondsTime to load utils op: 12.960282564163208 secondsTime to load utils op: 12.96167778968811 secondsTime to load utils op: 12.962266683578491 seconds + + + +Time to load utils op: 12.868578910827637 seconds +Time to load utils op: 12.86714792251587 seconds +Time to load utils op: 12.867297887802124 secondsTime to load utils op: 12.864776611328125 seconds + +Time to load utils op: 12.958306312561035 secondsTime to load utils op: 12.95703387260437 seconds +Time to load utils op: 12.958403587341309 seconds + +Time to load utils op: 12.958443641662598 seconds +Time to load utils op: 12.963791131973267 seconds +Time to load utils op: 12.965389490127563 seconds +Time to load utils op: 12.966509103775024 seconds +Time to load utils op: 12.967154502868652 seconds +Time to load utils op: 12.957504272460938 seconds +Time to load utils op: 12.957414865493774 seconds +Time to load utils op: 12.958162784576416 seconds +Time to load utils op: 12.955605030059814 seconds +Time to load utils op: 12.957389116287231 seconds +Time to load utils op: 12.953217506408691 secondsTime to load utils op: 12.957878351211548 seconds + +Time to load utils op: 12.956839561462402 seconds +Time to load utils op: 12.850239992141724 seconds +Time to load utils op: 12.871604442596436 secondsTime to load utils op: 12.872550010681152 seconds +Time to load utils op: 12.869450330734253 seconds + +Time to load utils op: 12.961687803268433 seconds +Time to load utils op: 12.963106870651245 seconds +Time to load utils op: 12.963704347610474 secondsTime to load utils op: 12.96425461769104 seconds + +Time to load utils op: 12.964396953582764 secondsTime to load utils op: 12.965296030044556 secondsTime to load utils op: 12.965505838394165 seconds +Time to load utils op: 12.965477228164673 seconds + + +Time to load utils op: 12.968895196914673 secondsTime to load utils op: 12.973491668701172 seconds + +Time to load utils op: 12.968560934066772 seconds +Time to load utils op: 12.967729330062866 seconds +Time to load utils op: 12.965198278427124 seconds +Time to load utils op: 12.965030670166016 secondsTime to load utils op: 12.96368956565857 secondsTime to load utils op: 12.965493440628052 seconds + + +Time to load utils op: 12.962815046310425 seconds +Time to load utils op: 12.959651947021484 seconds +Time to load utils op: 12.970352172851562 secondsTime to load utils op: 12.963162422180176 seconds + +Time to load utils op: 12.959879875183105 secondsTime to load utils op: 12.95913052558899 secondsTime to load utils op: 12.959542274475098 seconds + + +Time to load utils op: 12.957708358764648 seconds +Time to load utils op: 12.967376232147217 seconds +Time to load utils op: 12.965022563934326 secondsTime to load utils op: 12.967831373214722 secondsTime to load utils op: 12.966516971588135 secondsTime to load utils op: 12.966928243637085 seconds + + + +Time to load utils op: 12.968225955963135 seconds +Time to load utils op: 12.96767807006836 secondsTime to load utils op: 12.968275308609009 seconds + +Time to load utils op: 12.965970516204834 secondsTime to load utils op: 12.970263004302979 seconds + +Time to load utils op: 12.965487241744995 secondsTime to load utils op: 12.964370965957642 secondsTime to load utils op: 12.966172218322754 seconds + + +Time to load utils op: 12.96253490447998 secondsTime to load utils op: 12.963299751281738 secondsTime to load utils op: 12.961650609970093 seconds + +Time to load utils op: 12.966428279876709 secondsTime to load utils op: 12.962206840515137 seconds + +Time to load utils op: 12.965876579284668 seconds +Time to load utils op: 12.965754270553589 seconds + +Time to load utils op: 12.961627960205078 seconds +Time to load utils op: 12.971931219100952 seconds +Time to load utils op: 12.965678691864014 seconds +Time to load utils op: 12.965532302856445 seconds +Time to load utils op: 12.956801176071167 secondsTime to load utils op: 12.958102226257324 seconds + +Time to load utils op: 12.962058067321777 secondsTime to load utils op: 12.96275019645691 seconds +Time to load utils op: 12.962835550308228 seconds +Time to load utils op: 12.958291292190552 seconds +Time to load utils op: 12.95795750617981 seconds + +Time to load utils op: 12.959762573242188 secondsTime to load utils op: 12.959187746047974 seconds +Time to load utils op: 12.959293842315674 secondsTime to load utils op: 12.955705642700195 seconds +Time to load utils op: 12.962854862213135 seconds + + +Time to load utils op: 12.962584495544434 seconds +Time to load utils op: 12.961988925933838 seconds +Time to load utils op: 12.960957765579224 seconds +Time to load utils op: 12.962092399597168 seconds +Time to load utils op: 12.963632822036743 seconds +Time to load utils op: 12.973764419555664 seconds +Time to load utils op: 12.960137367248535 seconds +Time to load utils op: 12.973840475082397 seconds +Rank: 55 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 18 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 49 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 98 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 42 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 119 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 31 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 7 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 10 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 36 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 96 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 102 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 93 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 40 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 85 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 12 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 29 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 91 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 17 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 73 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 46 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 82 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 14 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 44 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 50 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 113 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 56 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 20 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 77 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 62 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 38 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 21 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 86 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 9 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 101 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 95 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 123 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 90 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 112 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 76 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 37 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 25 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 65 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 53 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 81 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 108 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 75 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 97 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 104 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 35 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 71 partition count [1, 1] and sizes[(807360000, False), (179800, False)] Rank: 70 partition count [1, 1] and sizes[(807360000, False), (179800, False)] + +Rank: 117 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 52 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 122 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 13 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 60 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 68 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 89 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 16 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 61 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 33 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 32 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 48 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 24 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 30 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 51 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 106 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 115 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 11 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 57 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 64 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 47 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 22 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 15 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 39 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 26 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 99 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 110 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 5 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 23 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 43 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 27 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 19 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 114 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 116 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 94 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 88 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 63 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 105 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 118 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 58 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 8 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 6 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 34 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 59 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 107 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 4 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 45 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 87 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 120 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 83 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 54 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 67 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 79 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 78 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 74 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 103 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 41 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 69 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 28 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 80 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 121 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 92 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 0 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 72 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 100 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 84 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 3 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 124 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 1 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 126 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 2 partition count [1, 1] and sizes[(978112000, False), (179800, False)] +Rank: 109 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 127 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 66 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Rank: 125 partition count [1, 1] and sizes[(978112000, False), (203000, False)] +Rank: 111 partition count [1, 1] and sizes[(807360000, False), (179800, False)] +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0022156238555908203 seconds +Time to load utils op: 0.0020639896392822266 seconds +Time to load utils op: 0.002123117446899414 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils...Loading extension module utils... + + +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0009915828704833984 secondsTime to load utils op: 0.0013589859008789062 seconds + +Time to load utils op: 0.001191854476928711 seconds +Time to load utils op: 0.00096893310546875 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils...Loading extension module utils... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0012829303741455078 seconds +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.001027822494506836 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0009870529174804688 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0012083053588867188 seconds +Time to load utils op: 0.0010330677032470703 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0011725425720214844 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils... +Time to load utils op: 0.00102996826171875 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + +Time to load utils op: 0.0013265609741210938 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0012252330780029297 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0012824535369873047 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0011508464813232422 seconds +Time to load utils op: 0.0009663105010986328 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0009777545928955078 seconds +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0010182857513427734 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0010037422180175781 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0009372234344482422 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0011754035949707031 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0014369487762451172 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.001209259033203125 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.001102447509765625 seconds + +Loading extension module utils... +Time to load utils op: 0.0014262199401855469 seconds +Time to load utils op: 0.0009341239929199219 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Time to load utils op: 0.0011372566223144531 seconds +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + +Time to load utils op: 0.0009031295776367188 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +Time to load utils op: 0.00139617919921875 secondsTime to load utils op: 0.001287698745727539 seconds + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0011546611785888672 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.001371145248413086 seconds +Time to load utils op: 0.0012905597686767578 seconds +Time to load utils op: 0.0011281967163085938 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0012793540954589844 seconds +Time to load utils op: 0.0012006759643554688 secondsTime to load utils op: 0.0013451576232910156 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Time to load utils op: 0.0010671615600585938 seconds +Time to load utils op: 0.000934600830078125 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0012030601501464844 secondsTime to load utils op: 0.0010116100311279297 seconds + +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0012400150299072266 seconds +Time to load utils op: 0.0010917186737060547 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Time to load utils op: 0.001117706298828125 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0011112689971923828 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0009090900421142578 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0012853145599365234 seconds +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils...No modifications detected for re-loaded extension module utils, skipping build step... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0013110637664794922 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Loading extension module utils... +Time to load utils op: 0.0012106895446777344 seconds +Time to load utils op: 0.0010652542114257812 seconds +Loading extension module utils... +Time to load utils op: 0.0012142658233642578 seconds +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0014040470123291016 seconds +Time to load utils op: 0.0009138584136962891 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0010476112365722656 seconds +Time to load utils op: 0.00101470947265625 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0013113021850585938 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + +Loading extension module utils... +Time to load utils op: 0.0012836456298828125 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0011701583862304688 seconds +Time to load utils op: 0.0014300346374511719 seconds +Time to load utils op: 0.0012028217315673828 seconds +Time to load utils op: 0.0009922981262207031 seconds +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils...Loading extension module utils... + +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0011448860168457031 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Time to load utils op: 0.001007080078125 secondsTime to load utils op: 0.0010521411895751953 seconds + +Loading extension module utils...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0011069774627685547 seconds +Time to load utils op: 0.001329183578491211 seconds +Time to load utils op: 0.0009720325469970703 seconds +Time to load utils op: 0.0013270378112792969 seconds +Time to load utils op: 0.0010533332824707031 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0012021064758300781 seconds +Time to load utils op: 0.0010991096496582031 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0010781288146972656 seconds +Time to load utils op: 0.0011827945709228516 seconds +Loading extension module utils... +Time to load utils op: 0.0012593269348144531 seconds +Time to load utils op: 0.0013196468353271484 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0011854171752929688 seconds +Time to load utils op: 0.001041412353515625 seconds +Time to load utils op: 0.00118255615234375 seconds +Time to load utils op: 0.0010650157928466797 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0012547969818115234 seconds +Time to load utils op: 0.0012295246124267578 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Loading extension module utils... +Time to load utils op: 0.0014696121215820312 seconds +Time to load utils op: 0.0010030269622802734 seconds +Time to load utils op: 0.0014066696166992188 seconds +Time to load utils op: 0.0009477138519287109 seconds +Time to load utils op: 0.0011942386627197266 secondsTime to load utils op: 0.0013849735260009766 seconds + +Time to load utils op: 0.0009655952453613281 seconds +Time to load utils op: 0.0014467239379882812 seconds +Time to load utils op: 0.0010600090026855469 seconds +Time to load utils op: 0.0011093616485595703 seconds +Time to load utils op: 0.0015153884887695312 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Time to load utils op: 0.0013065338134765625 seconds +Time to load utils op: 0.0012111663818359375 seconds +Time to load utils op: 0.0015406608581542969 seconds +Time to load utils op: 0.0013866424560546875 seconds +Time to load utils op: 0.0012335777282714844 seconds +Time to load utils op: 0.0014047622680664062 seconds +Time to load utils op: 0.0013852119445800781 seconds +Time to load utils op: 0.0014393329620361328 seconds +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0018761157989501953 seconds +Time to load utils op: 0.0017096996307373047 seconds +Time to load utils op: 0.0019643306732177734 seconds +Time to load utils op: 0.0016758441925048828 seconds +Time to load utils op: 0.0018913745880126953 seconds +Time to load utils op: 0.0017914772033691406 seconds +Time to load utils op: 0.0020761489868164062 seconds +Time to load utils op: 0.0019230842590332031 seconds +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root...Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... + +No modifications detected for re-loaded extension module utils, skipping build step... +No modifications detected for re-loaded extension module utils, skipping build step...Loading extension module utils... + +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +Loading extension module utils...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0013082027435302734 seconds +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step...No modifications detected for re-loaded extension module utils, skipping build step... + +No modifications detected for re-loaded extension module utils, skipping build step... +Time to load utils op: 0.0013396739959716797 seconds +Loading extension module utils...Loading extension module utils... + +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0012700557708740234 seconds +Time to load utils op: 0.0012965202331542969 seconds +Time to load utils op: 0.0012645721435546875 seconds +Time to load utils op: 0.0013267993927001953 seconds +Time to load utils op: 0.0012598037719726562 seconds +Time to load utils op: 0.00138092041015625 seconds +Time to load utils op: 0.0012774467468261719 secondsTime to load utils op: 0.0012211799621582031 seconds + +Time to load utils op: 0.0013713836669921875 seconds +Time to load utils op: 0.0012657642364501953 seconds +[2021-10-21 21:28:16,324] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states +[2021-10-21 21:28:16,324] [INFO] [utils.py:807:see_memory_usage] MA 5.47 GB Max_MA 7.29 GB CA 9.25 GB Max_CA 9 GB +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +[2021-10-21 21:28:16,324] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.3 GB, percent = 21.5% +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0010251998901367188 seconds +Time to load utils op: 0.0013473033905029297 seconds +Time to load utils op: 0.0012462139129638672 seconds +Time to load utils op: 0.0012161731719970703 seconds +[2021-10-21 21:28:16,378] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states +[2021-10-21 21:28:16,379] [INFO] [utils.py:807:see_memory_usage] MA 12.76 GB Max_MA 16.41 GB CA 20.19 GB Max_CA 20 GB +[2021-10-21 21:28:16,379] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.3 GB, percent = 21.5% +[2021-10-21 21:28:16,379] [INFO] [stage2.py:474:__init__] optimizer state initialized +[2021-10-21 21:28:16,414] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer +[2021-10-21 21:28:16,414] [INFO] [utils.py:807:see_memory_usage] MA 12.76 GB Max_MA 12.76 GB CA 20.19 GB Max_CA 20 GB +[2021-10-21 21:28:16,415] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.3 GB, percent = 21.5% +[2021-10-21 21:28:16,415] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam +[2021-10-21 21:28:16,415] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler +[2021-10-21 21:28:16,415] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = +[2021-10-21 21:28:16,415] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] +[2021-10-21 21:28:16,415] [INFO] [config.py:940:print] DeepSpeedEngine configuration: +[2021-10-21 21:28:16,415] [INFO] [config.py:944:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2021-10-21 21:28:16,415] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2021-10-21 21:28:16,415] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False +[2021-10-21 21:28:16,415] [INFO] [config.py:944:print] amp_enabled .................. False +[2021-10-21 21:28:16,415] [INFO] [config.py:944:print] amp_params ................... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] curriculum_enabled ........... True +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] curriculum_params ............ {'curriculum_type': 'seqlen', 'min_difficulty': 64, 'max_difficulty': 2048, 'schedule_type': 'fixed_linear', 'schedule_config': {'total_curriculum_step': 36000, 'difficulty_step': 8}} +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] dataloader_drop_last ......... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] disable_allgather ............ False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] dump_state ................... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_enabled ........... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] eigenvalue_verbose ........... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] elasticity_enabled ........... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] flops_profiler_config ........ { + "enabled": false, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] fp16_enabled ................. True +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] global_rank .................. 0 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] gradient_accumulation_steps .. 2048 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] loss_scale ................... 0 +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] memory_breakdown ............. False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] optimizer_name ............... None +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] optimizer_params ............. None +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] pld_enabled .................. False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] pld_params ................... False +[2021-10-21 21:28:16,416] [INFO] [config.py:944:print] prescale_gradients ........... False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_groups .............. 1 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_offset .............. 1000 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_period .............. 1000 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_rounding ............ 0 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_start_bits .......... 16 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_target_bits ......... 8 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_training_enabled .... False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_type ................ 0 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] quantize_verbose ............. False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] scheduler_name ............... None +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] scheduler_params ............. None +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] sparse_attention ............. None +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] steps_per_print .............. 2000 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] tensorboard_enabled .......... False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] tensorboard_output_path ...... +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] train_batch_size ............. 2048 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] use_quantizer_kernel ......... False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] wall_clock_breakdown ......... False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] world_size ................... 1 +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] zero_allow_untested_optimizer False +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] zero_config .................. { + "stage": 1, + "contiguous_gradients": true, + "reduce_scatter": true, + "reduce_bucket_size": 5.000000e+08, + "allgather_partitions": true, + "allgather_bucket_size": 5.000000e+08, + "overlap_comm": false, + "load_from_fp32_weights": true, + "elastic_checkpoint": true, + "offload_param": null, + "offload_optimizer": null, + "sub_group_size": 1.000000e+09, + "prefetch_bucket_size": 5.000000e+07, + "param_persistence_threshold": 1.000000e+05, + "max_live_parameters": 1.000000e+09, + "max_reuse_distance": 1.000000e+09, + "gather_fp16_weights_on_model_save": false, + "ignore_unused_parameters": true, + "round_robin_gradients": false, + "legacy_stage1": false +} +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] zero_enabled ................. True +[2021-10-21 21:28:16,417] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 +[2021-10-21 21:28:16,418] [INFO] [config.py:946:print] json = { + "train_micro_batch_size_per_gpu": 1, + "train_batch_size": 2.048000e+03, + "gradient_clipping": 1.0, + "zero_optimization": { + "stage": 1 + }, + "fp16": { + "enabled": true, + "loss_scale": 0, + "loss_scale_window": 500, + "hysteresis": 2, + "min_loss_scale": 1, + "initial_scale_power": 12 + }, + "curriculum_learning": { + "enabled": true, + "curriculum_type": "seqlen", + "min_difficulty": 64, + "max_difficulty": 2.048000e+03, + "schedule_type": "fixed_linear", + "schedule_config": { + "total_curriculum_step": 3.600000e+04, + "difficulty_step": 8 + } + }, + "steps_per_print": 2.000000e+03, + "wall_clock_breakdown": false +} +Using /gpfsdswork/projects/rech/eha/ura81os/.cache/torch_extensions as PyTorch extensions root... +No modifications detected for re-loaded extension module utils, skipping build step... +Loading extension module utils... +Time to load utils op: 0.0007581710815429688 seconds +[2021-10-21 21:28:16,418] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=2048 micro_batch_size=1 +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=5 [0, 5) STAGE_PARAMS=978291800 (978.292M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=96 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=98 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=67 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=65 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=97 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=64 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=66 STAGE=16 LAYERS=2 [35, 37) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=99 STAGE=24 LAYERS=2 [51, 53) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=9 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=33 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=34 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=35 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=48 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=51 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=50 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=49 STAGE=12 LAYERS=2 [27, 29) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=90 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=91 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=32 STAGE=8 LAYERS=2 [19, 21) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=18 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=16 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=17 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=11 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=8 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=112 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=114 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=115 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=81 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=82 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=83 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=80 STAGE=20 LAYERS=2 [43, 45) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=88 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=89 STAGE=22 LAYERS=2 [47, 49) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=12 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=14 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=19 STAGE=4 LAYERS=2 [11, 13) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=10 STAGE=2 LAYERS=2 [7, 9) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=113 STAGE=28 LAYERS=2 [59, 61) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=75 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=73 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=84 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=87 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=120 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=121 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=106 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=107 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=105 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=13 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=59 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=58 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=56 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=25 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=24 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=70 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=124 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=72 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=74 STAGE=18 LAYERS=2 [39, 41) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=111 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=118 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=40 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=42 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=86 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=85 STAGE=21 LAYERS=2 [45, 47) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=95 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=20 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=21 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=122 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=123 STAGE=30 LAYERS=2 [63, 65) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=104 STAGE=26 LAYERS=2 [55, 57) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=15 STAGE=3 LAYERS=2 [9, 11) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=7 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=4 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=5 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=62 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=101 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=102 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=57 STAGE=14 LAYERS=2 [31, 33) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=26 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=27 STAGE=6 LAYERS=2 [15, 17) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=29 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=71 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=69 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=68 STAGE=17 LAYERS=2 [37, 39) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=44 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=127 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=125 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=126 STAGE=31 LAYERS=6 [65, 71) STAGE_PARAMS=978315000 (978.315M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=77 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=76 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=109 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=108 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=119 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=117 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=116 STAGE=29 LAYERS=2 [61, 63) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=36 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=39 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=38 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=41 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=43 STAGE=10 LAYERS=2 [23, 25) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=94 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=92 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=93 STAGE=23 LAYERS=2 [49, 51) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=22 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=23 STAGE=5 LAYERS=2 [13, 15) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=52 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=53 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=6 STAGE=1 LAYERS=2 [5, 7) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=63 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=60 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=61 STAGE=15 LAYERS=2 [33, 35) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=100 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=103 STAGE=25 LAYERS=2 [53, 55) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=30 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=28 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=31 STAGE=7 LAYERS=2 [17, 19) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=46 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=45 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=47 STAGE=11 LAYERS=2 [25, 27) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=78 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=79 STAGE=19 LAYERS=2 [41, 43) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=110 STAGE=27 LAYERS=2 [57, 59) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=37 STAGE=9 LAYERS=2 [21, 23) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=55 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,810] [INFO] [engine.py:151:__init__] RANK=54 STAGE=13 LAYERS=2 [29, 31) STAGE_PARAMS=807539800 (807.540M) TOTAL_PARAMS=104731203200 (104731.203M) UNIQUE_PARAMS=104048195200 (104048.195M) +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,897] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints + will not load any checkpoints and will start from random +[2021-10-21 21:28:16,901] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,901] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,901] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,901] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,902] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,903] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,904] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +[2021-10-21 21:28:16,905] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. +time (ms) | load-checkpoint: 8.61 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + + +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 125.2213504 +estimated model parameters: 125.2213504 +estimated model parameters: 125.2213504 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +/gpfsssd/worksf/projects/rech/six/commun/code/tr8b-104B/Megatron-DeepSpeed-conglongli/megatron/utils.py:274: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings + warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 125.22432 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 125.22432 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 125.22432 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944estimated model parameters: 103.3650944 + +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters: 103.3650944 +estimated model parameters: 103.3650944 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.368064 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + + +estimated model parameters without embeddings: 103.3650944estimated model parameters without embeddings: 103.3650944 + +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +estimated model parameters without embeddings: 103.3650944 +[after model, optimizer, and learning rate scheduler are built] datetime: 2021-10-21 21:28:16 +> building train, validation, and test datasets ... + > datasets target sizes (minimum size): + train: 600000000 + validation: 3000320 + test: 10240 +> building train, validation, and test datasets for GPT ... + > building dataset index ... + reading sizes... + reading pointers... + reading document index... + creating numpy buffer of mmap... + creating memory view of numpy buffer... + > finished creating indexed dataset in 0.363446 seconds + number of documents: 304230423 + > dataset split: + train: + document indices in [0, 288714672) total of 288714672 documents + validation: + document indices in [288714672, 303926193) total of 15211521 documents + test: + document indices in [303926193, 304230423) total of 304230 documents + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_600000000ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.230 seconds + total number of samples: 657686117 + total number of epochs: 5 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_3000320ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.164 seconds + total number of samples: 6927161 + total number of epochs: 1 + > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_doc_idx.npy + > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_sample_idx.npy + > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_10240ns_2048sl_43s_shuffle_idx.npy + loaded indexed file in 0.043 seconds + total number of samples: 137384 + total number of epochs: 1 +> finished creating GPT datasets ... +[after dataloaders are built] datetime: 2021-10-21 21:28:23 +done with setup ... +training ... +Number of parameters: 125.2213504 billionNumber of parameters: 125.2213504 billion + +time (ms) | model-and-optimizer-setup: 18012.40 | train/valid/test-data-iterators-setup: 5643.16 +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.22432 billionNumber of parameters: 125.22432 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billionNumber of parameters: 103.3650944 billion + + +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billionNumber of parameters without embeddings: 103.368064 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billionNumber of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion + +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.22432 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.368064 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 125.2213504 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +Number of parameters: 103.3650944 billion +Number of parameters without embeddings: 103.3650944 billion +[before the start of training step] datetime: 2021-10-21 21:28:23 +[2021-10-21 21:28:23,393] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information +[2021-10-21 21:28:23,393] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False +[2021-10-21 21:28:23,393] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 64 total layers +[2021-10-21 21:28:23,393] [INFO] [checkpointing.py:554:forward] ----Synchronization False +[2021-10-21 21:28:23,393] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False +[Rank 1] (after 1 iterations) memory (MB) | allocated: 13202.67822265625 | max allocated: 20666.22705078125 | reserved: 24442.0 | max reserved: 24442.0 +[Rank 125] (after 1 iterations) memory (MB) | allocated: 13082.60107421875 | max allocated: 20546.20703125 | reserved: 24406.0 | max reserved: 24406.0 +[Rank 5] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20086.0 | max reserved: 20086.0 +[Rank 9] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 13] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 17] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 25] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 29] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 33] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 21] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.7158203125 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 0] (after 1 iterations) memory (MB) | allocated: 13203.03955078125 | max allocated: 20666.58837890625 | reserved: 24442.0 | max reserved: 24442.0 +[Rank 8] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 124] (after 1 iterations) memory (MB) | allocated: 13082.369140625 | max allocated: 20545.97509765625 | reserved: 24406.0 | max reserved: 24406.0 +[Rank 4] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20086.0 | max reserved: 20086.0 +[Rank 12] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 24] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 45] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 16] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.7158203125 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 20] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.7158203125 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 49] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 41] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 32] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 53] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 28] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 61] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 37] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 36] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 40] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 57] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 65] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 44] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 73] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 69] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 48] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 81] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 77] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 56] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 85] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 76] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 68] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 64] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 89] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 93] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 88] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 84] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 72] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 97] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 80] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 96] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 92] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 52] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 60] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 126] (after 1 iterations) memory (MB) | allocated: 13082.369140625 | max allocated: 20545.97509765625 | reserved: 24406.0 | max reserved: 24406.0 +[Rank 2] (after 1 iterations) memory (MB) | allocated: 13202.06298828125 | max allocated: 20665.61181640625 | reserved: 24442.0 | max reserved: 24442.0[Rank 3] (after 1 iterations) memory (MB) | allocated: 13203.30322265625 | max allocated: 20666.85205078125 | reserved: 24442.0 | max reserved: 24442.0 + +[Rank 10] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 7] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20086.0 | max reserved: 20086.0 +[Rank 11] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 6] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20086.0 | max reserved: 20086.0 +[Rank 14] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 15] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 19] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 22] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.7158203125 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 100] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 23] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.7158203125 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 105] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 18] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.7158203125 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 112] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16948.21923828125 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 108] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 113] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.798828125 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 30] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 109] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 31] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 116] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 120] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 101] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 117] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 26] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 27] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20084.0 | max reserved: 20084.0 +[Rank 121] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 35] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0[Rank 34] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 + +[Rank 38] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 39] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 47] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0[Rank 46] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 + +[Rank 50] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 104] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 42] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 43] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20082.0 | max reserved: 20082.0 +[Rank 51] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 55] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 58] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 54] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 59] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 63] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 62] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 66] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 71] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 70] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 67] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20080.0 | max reserved: 20080.0 +[Rank 75] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 74] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 79] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 78] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 83] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 87] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 82] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 86] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20078.0 | max reserved: 20078.0 +[Rank 91] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 95] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 94] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 90] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 99] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 98] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 103] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 102] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 106] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 107] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.29541015625 | reserved: 20076.0 | max reserved: 20076.0 +[Rank 111] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 110] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 114] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.798828125 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 119] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 115] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.798828125 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 123] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 118] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 +[Rank 122] (after 1 iterations) memory (MB) | allocated: 10787.11376953125 | max allocated: 16947.37841796875 | reserved: 16994.0 | max reserved: 16994.0 + iteration 1/ 292968 | consumed samples: 2048 | consumed tokens: 131072 | elapsed time per iteration (ms): 204975.6 | learning rate: 5.680E-07 | global batch size: 2048 | lm loss: 1.316407E+01 | loss scale: 4096.0 | grad norm: 224806.780 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +[Rank 127] (after 1 iterations) memory (MB) | allocated: 13082.68505859375 | max allocated: 20546.291015625 | reserved: 24406.0 | max reserved: 24406.0 +time (ms) + iteration 2/ 292968 | consumed samples: 4096 | consumed tokens: 262144 | elapsed time per iteration (ms): 126852.5 | learning rate: 1.136E-06 | global batch size: 2048 | lm loss: 1.315916E+01 | loss scale: 4096.0 | grad norm: 225244.360 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 3/ 292968 | consumed samples: 6144 | consumed tokens: 393216 | elapsed time per iteration (ms): 116457.3 | learning rate: 1.704E-06 | global batch size: 2048 | lm loss: 2.324803E+01 | loss scale: 4096.0 | grad norm: 1381761.459 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 4/ 292968 | consumed samples: 8192 | consumed tokens: 524288 | elapsed time per iteration (ms): 112171.3 | learning rate: 2.272E-06 | global batch size: 2048 | lm loss: 3.475053E+01 | loss scale: 4096.0 | grad norm: 1845285.271 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 5/ 292968 | consumed samples: 10240 | consumed tokens: 655360 | elapsed time per iteration (ms): 102880.2 | learning rate: 2.840E-06 | global batch size: 2048 | lm loss: 3.745642E+01 | loss scale: 4096.0 | grad norm: 1436900.964 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 6/ 292968 | consumed samples: 12288 | consumed tokens: 786432 | elapsed time per iteration (ms): 102783.6 | learning rate: 3.408E-06 | global batch size: 2048 | lm loss: 3.983621E+01 | loss scale: 4096.0 | grad norm: 1067945.196 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 7/ 292968 | consumed samples: 14336 | consumed tokens: 917504 | elapsed time per iteration (ms): 95986.7 | learning rate: 3.976E-06 | global batch size: 2048 | lm loss: 3.536437E+01 | loss scale: 4096.0 | grad norm: 1080819.724 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 8/ 292968 | consumed samples: 16384 | consumed tokens: 1048576 | elapsed time per iteration (ms): 92557.1 | learning rate: 4.544E-06 | global batch size: 2048 | lm loss: 3.412041E+01 | loss scale: 4096.0 | grad norm: 1023567.591 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 9/ 292968 | consumed samples: 18432 | consumed tokens: 1179648 | elapsed time per iteration (ms): 91935.4 | learning rate: 5.112E-06 | global batch size: 2048 | lm loss: 3.219579E+01 | loss scale: 4096.0 | grad norm: 654723.072 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 10/ 292968 | consumed samples: 20480 | consumed tokens: 1310720 | elapsed time per iteration (ms): 90080.9 | learning rate: 5.680E-06 | global batch size: 2048 | lm loss: 2.971920E+01 | loss scale: 4096.0 | grad norm: 537991.005 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 11/ 292968 | consumed samples: 22528 | consumed tokens: 1441792 | elapsed time per iteration (ms): 88691.3 | learning rate: 6.249E-06 | global batch size: 2048 | lm loss: 2.729292E+01 | loss scale: 4096.0 | grad norm: 424745.696 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 12/ 292968 | consumed samples: 24576 | consumed tokens: 1572864 | elapsed time per iteration (ms): 88398.6 | learning rate: 6.817E-06 | global batch size: 2048 | lm loss: 2.790564E+01 | loss scale: 4096.0 | grad norm: 644211.527 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 13/ 292968 | consumed samples: 26624 | consumed tokens: 1703936 | elapsed time per iteration (ms): 88502.3 | learning rate: 7.385E-06 | global batch size: 2048 | lm loss: 2.526423E+01 | loss scale: 4096.0 | grad norm: 454067.335 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 14/ 292968 | consumed samples: 28672 | consumed tokens: 1835008 | elapsed time per iteration (ms): 87733.4 | learning rate: 7.953E-06 | global batch size: 2048 | lm loss: 2.331569E+01 | loss scale: 4096.0 | grad norm: 276743.182 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 15/ 292968 | consumed samples: 30720 | consumed tokens: 1966080 | elapsed time per iteration (ms): 86247.0 | learning rate: 8.521E-06 | global batch size: 2048 | lm loss: 2.094402E+01 | loss scale: 4096.0 | grad norm: 226314.869 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 16/ 292968 | consumed samples: 32768 | consumed tokens: 2097152 | elapsed time per iteration (ms): 86013.9 | learning rate: 9.089E-06 | global batch size: 2048 | lm loss: 1.969643E+01 | loss scale: 4096.0 | grad norm: 135309.147 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 17/ 292968 | consumed samples: 34816 | consumed tokens: 2228224 | elapsed time per iteration (ms): 86000.3 | learning rate: 9.657E-06 | global batch size: 2048 | lm loss: 1.816238E+01 | loss scale: 4096.0 | grad norm: 74699.814 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 18/ 292968 | consumed samples: 36864 | consumed tokens: 2359296 | elapsed time per iteration (ms): 85741.8 | learning rate: 1.022E-05 | global batch size: 2048 | lm loss: 1.715309E+01 | loss scale: 4096.0 | grad norm: 43055.680 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 19/ 292968 | consumed samples: 38912 | consumed tokens: 2490368 | elapsed time per iteration (ms): 86363.7 | learning rate: 1.079E-05 | global batch size: 2048 | lm loss: 1.587515E+01 | loss scale: 4096.0 | grad norm: 40328.680 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 20/ 292968 | consumed samples: 40960 | consumed tokens: 2621440 | elapsed time per iteration (ms): 87039.7 | learning rate: 1.136E-05 | global batch size: 2048 | lm loss: 1.445321E+01 | loss scale: 4096.0 | grad norm: 178516.421 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 21/ 292968 | consumed samples: 43008 | consumed tokens: 2752512 | elapsed time per iteration (ms): 86563.9 | learning rate: 1.193E-05 | global batch size: 2048 | lm loss: 1.723314E+01 | loss scale: 4096.0 | grad norm: 467676.180 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 22/ 292968 | consumed samples: 45056 | consumed tokens: 2883584 | elapsed time per iteration (ms): 86929.8 | learning rate: 1.250E-05 | global batch size: 2048 | lm loss: 1.384353E+01 | loss scale: 4096.0 | grad norm: 349625.568 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 23/ 292968 | consumed samples: 47104 | consumed tokens: 3014656 | elapsed time per iteration (ms): 86274.0 | learning rate: 1.307E-05 | global batch size: 2048 | lm loss: 1.433385E+01 | loss scale: 4096.0 | grad norm: 295627.439 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 24/ 292968 | consumed samples: 49152 | consumed tokens: 3145728 | elapsed time per iteration (ms): 87804.9 | learning rate: 1.363E-05 | global batch size: 2048 | lm loss: 1.566444E+01 | loss scale: 4096.0 | grad norm: 426731.939 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 25/ 292968 | consumed samples: 51200 | consumed tokens: 3276800 | elapsed time per iteration (ms): 86109.0 | learning rate: 1.420E-05 | global batch size: 2048 | lm loss: 1.351891E+01 | loss scale: 4096.0 | grad norm: 214665.644 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 26/ 292968 | consumed samples: 53248 | consumed tokens: 3407872 | elapsed time per iteration (ms): 86387.3 | learning rate: 1.477E-05 | global batch size: 2048 | lm loss: 1.299350E+01 | loss scale: 4096.0 | grad norm: 196219.543 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 27/ 292968 | consumed samples: 55296 | consumed tokens: 3538944 | elapsed time per iteration (ms): 85245.0 | learning rate: 1.534E-05 | global batch size: 2048 | lm loss: 1.253081E+01 | loss scale: 4096.0 | grad norm: 40435.746 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 28/ 292968 | consumed samples: 57344 | consumed tokens: 3670016 | elapsed time per iteration (ms): 86509.8 | learning rate: 1.591E-05 | global batch size: 2048 | lm loss: 1.233641E+01 | loss scale: 4096.0 | grad norm: 59434.881 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 29/ 292968 | consumed samples: 59392 | consumed tokens: 3801088 | elapsed time per iteration (ms): 86102.6 | learning rate: 1.647E-05 | global batch size: 2048 | lm loss: 1.230502E+01 | loss scale: 4096.0 | grad norm: 83241.888 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 30/ 292968 | consumed samples: 61440 | consumed tokens: 3932160 | elapsed time per iteration (ms): 85456.0 | learning rate: 1.704E-05 | global batch size: 2048 | lm loss: 1.178389E+01 | loss scale: 4096.0 | grad norm: 34948.162 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 31/ 292968 | consumed samples: 63488 | consumed tokens: 4063232 | elapsed time per iteration (ms): 86188.5 | learning rate: 1.761E-05 | global batch size: 2048 | lm loss: 1.131446E+01 | loss scale: 4096.0 | grad norm: 33246.558 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 32/ 292968 | consumed samples: 65536 | consumed tokens: 4194304 | elapsed time per iteration (ms): 85866.1 | learning rate: 1.818E-05 | global batch size: 2048 | lm loss: 1.087723E+01 | loss scale: 4096.0 | grad norm: 62673.048 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 33/ 292968 | consumed samples: 67584 | consumed tokens: 4325376 | elapsed time per iteration (ms): 85043.8 | learning rate: 1.875E-05 | global batch size: 2048 | lm loss: 1.036173E+01 | loss scale: 4096.0 | grad norm: 53524.152 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) + iteration 34/ 292968 | consumed samples: 69632 | consumed tokens: 4456448 | elapsed time per iteration (ms): 84939.6 | learning rate: 1.931E-05 | global batch size: 2048 | lm loss: 9.918847E+00 | loss scale: 4096.0 | grad norm: 59973.909 | num zeros: 0.0 | curriculum seqlen: 64 | number of skipped iterations: 0 | number of nan iterations: 0 | +time (ms) +saving checkpoint at iteration 34 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints +[2021-10-21 22:21:59,159] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/mp_rank_00_model_states.pt +[2021-10-21 22:21:59,294] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/mp_rank_01_model_states.pt +[2021-10-21 22:22:12,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_44_optim_states.pt +[2021-10-21 22:22:12,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_18_optim_states.pt +[2021-10-21 22:22:12,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_45_optim_states.pt +[2021-10-21 22:22:12,246] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_13_optim_states.pt +[2021-10-21 22:22:12,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_121_optim_states.pt +[2021-10-21 22:22:12,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_117_optim_states.pt +[2021-10-21 22:22:12,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_28_optim_states.pt +[2021-10-21 22:22:12,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_33_optim_states.pt +[2021-10-21 22:22:12,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_51_optim_states.pt +[2021-10-21 22:22:12,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_41_optim_states.pt +[2021-10-21 22:22:12,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_19_optim_states.pt +[2021-10-21 22:22:12,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_37_optim_states.pt +[2021-10-21 22:22:12,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_119_optim_states.pt +[2021-10-21 22:22:12,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_42_optim_states.pt +[2021-10-21 22:22:12,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_35_optim_states.pt +[2021-10-21 22:22:12,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_11_optim_states.pt +[2021-10-21 22:22:12,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_05_optim_states.pt +[2021-10-21 22:22:12,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_88_optim_states.pt +[2021-10-21 22:22:12,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_14_optim_states.pt +[2021-10-21 22:22:12,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_39_optim_states.pt +[2021-10-21 22:22:12,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_122_optim_states.pt +[2021-10-21 22:22:12,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_10_optim_states.pt +[2021-10-21 22:22:12,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_31_optim_states.pt +[2021-10-21 22:22:12,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_92_optim_states.pt +[2021-10-21 22:22:12,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_48_optim_states.pt +[2021-10-21 22:22:12,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_90_optim_states.pt +[2021-10-21 22:22:12,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_50_optim_states.pt +[2021-10-21 22:22:12,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_06_optim_states.pt +[2021-10-21 22:22:13,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_86_optim_states.pt +[2021-10-21 22:22:13,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_36_optim_states.pt +[2021-10-21 22:22:13,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_49_optim_states.pt +[2021-10-21 22:22:13,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_16_optim_states.pt +[2021-10-21 22:22:13,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_118_optim_states.pt +[2021-10-21 22:22:13,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_82_optim_states.pt +[2021-10-21 22:22:13,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_17_optim_states.pt +[2021-10-21 22:22:13,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_94_optim_states.pt +[2021-10-21 22:22:13,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_38_optim_states.pt +[2021-10-21 22:22:13,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_43_optim_states.pt +[2021-10-21 22:22:13,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_57_optim_states.pt +[2021-10-21 22:22:13,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_63_optim_states.pt +[2021-10-21 22:22:13,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_80_optim_states.pt +[2021-10-21 22:22:13,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_55_optim_states.pt +[2021-10-21 22:22:13,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_15_optim_states.pt +[2021-10-21 22:22:13,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_73_optim_states.pt +[2021-10-21 22:22:13,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_74_optim_states.pt +[2021-10-21 22:22:13,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_46_optim_states.pt +[2021-10-21 22:22:13,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_25_optim_states.pt +[2021-10-21 22:22:13,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_64_optim_states.pt +[2021-10-21 22:22:13,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_08_optim_states.pt +[2021-10-21 22:22:13,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_69_optim_states.pt +[2021-10-21 22:22:13,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_61_optim_states.pt +[2021-10-21 22:22:13,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_84_optim_states.pt +[2021-10-21 22:22:13,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_66_optim_states.pt +[2021-10-21 22:22:13,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_77_optim_states.pt +[2021-10-21 22:22:13,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_09_optim_states.pt +[2021-10-21 22:22:13,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_104_optim_states.pt +[2021-10-21 22:22:13,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_47_optim_states.pt +[2021-10-21 22:22:13,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_123_optim_states.pt +[2021-10-21 22:22:13,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_116_optim_states.pt +[2021-10-21 22:22:13,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_12_optim_states.pt +[2021-10-21 22:22:13,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_76_optim_states.pt +[2021-10-21 22:22:13,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_70_optim_states.pt +[2021-10-21 22:22:13,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_59_optim_states.pt +[2021-10-21 22:22:13,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_07_optim_states.pt +[2021-10-21 22:22:13,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_101_optim_states.pt +[2021-10-21 22:22:13,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_26_optim_states.pt +[2021-10-21 22:22:13,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_99_optim_states.pt +[2021-10-21 22:22:13,769] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_98_optim_states.pt +[2021-10-21 22:22:13,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_112_optim_states.pt +[2021-10-21 22:22:13,803] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_40_optim_states.pt +[2021-10-21 22:22:13,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_106_optim_states.pt +[2021-10-21 22:22:13,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_102_optim_states.pt +[2021-10-21 22:22:13,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_04_optim_states.pt +[2021-10-21 22:22:13,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_115_optim_states.pt +[2021-10-21 22:22:13,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_109_optim_states.pt +[2021-10-21 22:22:13,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_52_optim_states.pt +[2021-10-21 22:22:13,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_85_optim_states.pt +[2021-10-21 22:22:13,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_58_optim_states.pt +[2021-10-21 22:22:13,998] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_108_optim_states.pt +[2021-10-21 22:22:14,061] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_62_optim_states.pt +[2021-10-21 22:22:14,130] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_71_optim_states.pt +[2021-10-21 22:22:14,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_83_optim_states.pt +[2021-10-21 22:22:14,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_68_optim_states.pt +[2021-10-21 22:22:14,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_81_optim_states.pt +[2021-10-21 22:22:14,252] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_60_optim_states.pt +[2021-10-21 22:22:14,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_87_optim_states.pt +[2021-10-21 22:22:14,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_79_optim_states.pt +[2021-10-21 22:22:14,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_78_optim_states.pt +[2021-10-21 22:22:14,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_56_optim_states.pt +[2021-10-21 22:22:14,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_114_optim_states.pt +[2021-10-21 22:22:14,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_113_optim_states.pt +[2021-10-21 22:22:14,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_97_optim_states.pt +[2021-10-21 22:22:14,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_96_optim_states.pt +[2021-10-21 22:22:14,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_125_optim_states.pt +[2021-10-21 22:22:14,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_120_optim_states.pt +[2021-10-21 22:22:15,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_126_optim_states.pt +[2021-10-21 22:22:16,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_127_optim_states.pt +[2021-10-21 22:22:16,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_103_optim_states.pt +[2021-10-21 22:22:16,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_01_optim_states.pt +[2021-10-21 22:22:16,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_03_optim_states.pt +[2021-10-21 22:22:16,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_75_optim_states.pt +[2021-10-21 22:22:16,772] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_124_optim_states.pt +[2021-10-21 22:22:17,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_100_optim_states.pt +[2021-10-21 22:22:17,747] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_72_optim_states.pt +[2021-10-21 22:22:18,065] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_91_optim_states.pt +[2021-10-21 22:22:19,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_89_optim_states.pt +[2021-10-21 22:22:20,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_27_optim_states.pt +[2021-10-21 22:22:21,091] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_24_optim_states.pt +[2021-10-21 22:22:21,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_20_optim_states.pt +[2021-10-21 22:22:21,724] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_23_optim_states.pt +[2021-10-21 22:22:21,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_02_optim_states.pt +[2021-10-21 22:22:22,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_54_optim_states.pt +[2021-10-21 22:22:22,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_53_optim_states.pt +[2021-10-21 22:22:23,042] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_00_optim_states.pt +[2021-10-21 22:22:23,044] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_30_optim_states.pt +[2021-10-21 22:22:23,507] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_34_optim_states.pt +[2021-10-21 22:22:23,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_95_optim_states.pt +[2021-10-21 22:22:23,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_21_optim_states.pt +[2021-10-21 22:22:24,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_67_optim_states.pt +[2021-10-21 22:22:24,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_22_optim_states.pt +[2021-10-21 22:22:24,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_32_optim_states.pt +[2021-10-21 22:22:24,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_29_optim_states.pt +[2021-10-21 22:22:25,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_105_optim_states.pt +[2021-10-21 22:22:25,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_93_optim_states.pt +[2021-10-21 22:22:25,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_65_optim_states.pt +[2021-10-21 22:22:25,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_110_optim_states.pt +[2021-10-21 22:22:26,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_107_optim_states.pt +[2021-10-21 22:22:26,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints/global_step34/zero_pp_rank_0_mp_rank_111_optim_states.pt + successfully saved checkpoint at iteration 34 to /gpfsscratch/rech/six/commun/checkpoints/tr8b-104B/checkpoints +time (ms) | save-checkpoint: 30665.91 +[exiting program after 55.0033370534579 minutes] datetime: 2021-10-21 22:22:26