diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1304 @@ +[2023-11-22 03:55:25,159][05156] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-11-22 03:55:25,165][05156] Rollout worker 0 uses device cpu +[2023-11-22 03:55:25,168][05156] Rollout worker 1 uses device cpu +[2023-11-22 03:55:25,169][05156] Rollout worker 2 uses device cpu +[2023-11-22 03:55:25,171][05156] Rollout worker 3 uses device cpu +[2023-11-22 03:55:25,172][05156] Rollout worker 4 uses device cpu +[2023-11-22 03:55:25,173][05156] Rollout worker 5 uses device cpu +[2023-11-22 03:55:25,175][05156] Rollout worker 6 uses device cpu +[2023-11-22 03:55:25,176][05156] Rollout worker 7 uses device cpu +[2023-11-22 03:55:25,244][05156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-11-22 03:55:25,246][05156] InferenceWorker_p0-w0: min num requests: 2 +[2023-11-22 03:55:25,276][05156] Starting all processes... +[2023-11-22 03:55:25,277][05156] Starting process learner_proc0 +[2023-11-22 03:55:25,334][05156] Starting all processes... +[2023-11-22 03:55:25,342][05156] Starting process inference_proc0-0 +[2023-11-22 03:55:25,342][05156] Starting process rollout_proc0 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc1 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc2 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc3 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc4 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc5 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc6 +[2023-11-22 03:55:25,344][05156] Starting process rollout_proc7 +[2023-11-22 03:55:43,001][06878] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-11-22 03:55:43,002][06878] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-11-22 03:55:43,071][06878] Num visible devices: 1 +[2023-11-22 03:55:43,083][06896] Worker 4 uses CPU cores [0] +[2023-11-22 03:55:43,085][06898] Worker 6 uses CPU cores [0] +[2023-11-22 03:55:43,096][06897] Worker 5 uses CPU cores [1] +[2023-11-22 03:55:43,101][06878] Starting seed is not provided +[2023-11-22 03:55:43,101][06878] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-11-22 03:55:43,101][06878] Initializing actor-critic model on device cuda:0 +[2023-11-22 03:55:43,102][06878] RunningMeanStd input shape: (3, 72, 128) +[2023-11-22 03:55:43,103][06892] Worker 0 uses CPU cores [0] +[2023-11-22 03:55:43,103][06899] Worker 7 uses CPU cores [1] +[2023-11-22 03:55:43,108][06878] RunningMeanStd input shape: (1,) +[2023-11-22 03:55:43,112][06894] Worker 2 uses CPU cores [0] +[2023-11-22 03:55:43,134][06891] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-11-22 03:55:43,135][06891] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-11-22 03:55:43,150][06891] Num visible devices: 1 +[2023-11-22 03:55:43,162][06878] ConvEncoder: input_channels=3 +[2023-11-22 03:55:43,189][06895] Worker 3 uses CPU cores [1] +[2023-11-22 03:55:43,240][06893] Worker 1 uses CPU cores [1] +[2023-11-22 03:55:43,299][06878] Conv encoder output size: 512 +[2023-11-22 03:55:43,300][06878] Policy head output size: 512 +[2023-11-22 03:55:43,314][06878] Created Actor Critic model with architecture: +[2023-11-22 03:55:43,314][06878] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-11-22 03:55:43,434][06878] Using optimizer +[2023-11-22 03:55:44,393][06878] No checkpoints found +[2023-11-22 03:55:44,394][06878] Did not load from checkpoint, starting from scratch! +[2023-11-22 03:55:44,394][06878] Initialized policy 0 weights for model version 0 +[2023-11-22 03:55:44,397][06878] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-11-22 03:55:44,402][06878] LearnerWorker_p0 finished initialization! +[2023-11-22 03:55:44,487][06891] RunningMeanStd input shape: (3, 72, 128) +[2023-11-22 03:55:44,488][06891] RunningMeanStd input shape: (1,) +[2023-11-22 03:55:44,500][06891] ConvEncoder: input_channels=3 +[2023-11-22 03:55:44,610][06891] Conv encoder output size: 512 +[2023-11-22 03:55:44,611][06891] Policy head output size: 512 +[2023-11-22 03:55:44,702][05156] Inference worker 0-0 is ready! +[2023-11-22 03:55:44,704][05156] All inference workers are ready! Signal rollout workers to start! +[2023-11-22 03:55:44,895][06894] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:44,897][06892] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:44,899][06896] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:44,903][06898] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:45,062][06893] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:45,056][06895] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:45,061][06897] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:45,067][06899] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-11-22 03:55:45,238][05156] Heartbeat connected on Batcher_0 +[2023-11-22 03:55:45,241][05156] Heartbeat connected on LearnerWorker_p0 +[2023-11-22 03:55:45,282][05156] Heartbeat connected on InferenceWorker_p0-w0 +[2023-11-22 03:55:46,195][05156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-11-22 03:55:46,612][06894] Decorrelating experience for 0 frames... +[2023-11-22 03:55:46,630][06892] Decorrelating experience for 0 frames... +[2023-11-22 03:55:46,621][06898] Decorrelating experience for 0 frames... +[2023-11-22 03:55:46,656][06897] Decorrelating experience for 0 frames... +[2023-11-22 03:55:46,659][06895] Decorrelating experience for 0 frames... +[2023-11-22 03:55:47,764][06899] Decorrelating experience for 0 frames... +[2023-11-22 03:55:47,783][06895] Decorrelating experience for 32 frames... +[2023-11-22 03:55:48,711][06896] Decorrelating experience for 0 frames... +[2023-11-22 03:55:48,719][06892] Decorrelating experience for 32 frames... +[2023-11-22 03:55:48,714][06898] Decorrelating experience for 32 frames... +[2023-11-22 03:55:48,822][06894] Decorrelating experience for 32 frames... +[2023-11-22 03:55:50,144][06893] Decorrelating experience for 0 frames... +[2023-11-22 03:55:50,163][06899] Decorrelating experience for 32 frames... +[2023-11-22 03:55:50,623][06895] Decorrelating experience for 64 frames... +[2023-11-22 03:55:50,959][06897] Decorrelating experience for 32 frames... +[2023-11-22 03:55:51,054][06898] Decorrelating experience for 64 frames... +[2023-11-22 03:55:51,109][06894] Decorrelating experience for 64 frames... +[2023-11-22 03:55:51,195][05156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-11-22 03:55:51,902][06896] Decorrelating experience for 32 frames... +[2023-11-22 03:55:52,309][06892] Decorrelating experience for 64 frames... +[2023-11-22 03:55:52,585][06895] Decorrelating experience for 96 frames... +[2023-11-22 03:55:52,715][06899] Decorrelating experience for 64 frames... +[2023-11-22 03:55:52,850][05156] Heartbeat connected on RolloutWorker_w3 +[2023-11-22 03:55:53,091][06898] Decorrelating experience for 96 frames... +[2023-11-22 03:55:53,533][05156] Heartbeat connected on RolloutWorker_w6 +[2023-11-22 03:55:54,926][06896] Decorrelating experience for 64 frames... +[2023-11-22 03:55:55,101][06892] Decorrelating experience for 96 frames... +[2023-11-22 03:55:55,428][06894] Decorrelating experience for 96 frames... +[2023-11-22 03:55:55,604][06893] Decorrelating experience for 32 frames... +[2023-11-22 03:55:55,675][05156] Heartbeat connected on RolloutWorker_w0 +[2023-11-22 03:55:56,032][05156] Heartbeat connected on RolloutWorker_w2 +[2023-11-22 03:55:56,196][05156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-11-22 03:55:56,197][05156] Avg episode reward: [(0, '1.927')] +[2023-11-22 03:55:56,242][06897] Decorrelating experience for 64 frames... +[2023-11-22 03:55:56,368][06899] Decorrelating experience for 96 frames... +[2023-11-22 03:55:56,666][05156] Heartbeat connected on RolloutWorker_w7 +[2023-11-22 03:55:58,593][06893] Decorrelating experience for 64 frames... +[2023-11-22 03:55:58,926][06896] Decorrelating experience for 96 frames... +[2023-11-22 03:55:58,937][06897] Decorrelating experience for 96 frames... +[2023-11-22 03:55:59,566][05156] Heartbeat connected on RolloutWorker_w4 +[2023-11-22 03:55:59,568][05156] Heartbeat connected on RolloutWorker_w5 +[2023-11-22 03:55:59,896][06878] Signal inference workers to stop experience collection... +[2023-11-22 03:55:59,909][06891] InferenceWorker_p0-w0: stopping experience collection +[2023-11-22 03:56:00,160][06893] Decorrelating experience for 96 frames... +[2023-11-22 03:56:00,242][05156] Heartbeat connected on RolloutWorker_w1 +[2023-11-22 03:56:00,793][06878] Signal inference workers to resume experience collection... +[2023-11-22 03:56:00,794][06891] InferenceWorker_p0-w0: resuming experience collection +[2023-11-22 03:56:01,195][05156] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 158.4. Samples: 2376. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2023-11-22 03:56:01,198][05156] Avg episode reward: [(0, '2.925')] +[2023-11-22 03:56:06,197][05156] Fps is (10 sec: 2457.3, 60 sec: 1228.7, 300 sec: 1228.7). Total num frames: 24576. Throughput: 0: 340.4. Samples: 6808. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-11-22 03:56:06,200][05156] Avg episode reward: [(0, '3.772')] +[2023-11-22 03:56:11,195][05156] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 352.2. Samples: 8806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:56:11,202][05156] Avg episode reward: [(0, '3.840')] +[2023-11-22 03:56:11,716][06891] Updated weights for policy 0, policy_version 10 (0.0021) +[2023-11-22 03:56:16,195][05156] Fps is (10 sec: 2458.0, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 422.9. Samples: 12688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:56:16,203][05156] Avg episode reward: [(0, '4.327')] +[2023-11-22 03:56:21,196][05156] Fps is (10 sec: 2457.4, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 470.2. Samples: 16458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:56:21,198][05156] Avg episode reward: [(0, '4.323')] +[2023-11-22 03:56:26,195][05156] Fps is (10 sec: 2867.2, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 464.7. Samples: 18588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:56:26,198][05156] Avg episode reward: [(0, '4.465')] +[2023-11-22 03:56:26,357][06891] Updated weights for policy 0, policy_version 20 (0.0027) +[2023-11-22 03:56:31,195][05156] Fps is (10 sec: 3686.7, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 541.6. Samples: 24372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 03:56:31,198][05156] Avg episode reward: [(0, '4.436')] +[2023-11-22 03:56:36,197][05156] Fps is (10 sec: 3685.7, 60 sec: 2293.7, 300 sec: 2293.7). Total num frames: 114688. Throughput: 0: 649.4. Samples: 29224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:56:36,204][05156] Avg episode reward: [(0, '4.324')] +[2023-11-22 03:56:36,210][06878] Saving new best policy, reward=4.324! +[2023-11-22 03:56:38,850][06891] Updated weights for policy 0, policy_version 30 (0.0019) +[2023-11-22 03:56:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2308.7, 300 sec: 2308.7). Total num frames: 126976. Throughput: 0: 691.3. Samples: 31128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:56:41,198][05156] Avg episode reward: [(0, '4.422')] +[2023-11-22 03:56:41,216][06878] Saving new best policy, reward=4.422! +[2023-11-22 03:56:46,196][05156] Fps is (10 sec: 2457.9, 60 sec: 2321.0, 300 sec: 2321.0). Total num frames: 139264. Throughput: 0: 722.3. Samples: 34882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:56:46,199][05156] Avg episode reward: [(0, '4.500')] +[2023-11-22 03:56:46,204][06878] Saving new best policy, reward=4.500! +[2023-11-22 03:56:51,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2394.6). Total num frames: 155648. Throughput: 0: 725.6. Samples: 39458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:56:51,198][05156] Avg episode reward: [(0, '4.462')] +[2023-11-22 03:56:52,718][06891] Updated weights for policy 0, policy_version 40 (0.0033) +[2023-11-22 03:56:56,195][05156] Fps is (10 sec: 3686.6, 60 sec: 2935.5, 300 sec: 2516.1). Total num frames: 176128. Throughput: 0: 748.0. Samples: 42464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:56:56,202][05156] Avg episode reward: [(0, '4.405')] +[2023-11-22 03:57:01,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2566.8). Total num frames: 192512. Throughput: 0: 781.9. Samples: 47872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:57:01,205][05156] Avg episode reward: [(0, '4.348')] +[2023-11-22 03:57:05,879][06891] Updated weights for policy 0, policy_version 50 (0.0016) +[2023-11-22 03:57:06,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2560.0). Total num frames: 204800. Throughput: 0: 780.5. Samples: 51582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:57:06,198][05156] Avg episode reward: [(0, '4.461')] +[2023-11-22 03:57:11,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2554.0). Total num frames: 217088. Throughput: 0: 774.5. Samples: 53440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 03:57:11,198][05156] Avg episode reward: [(0, '4.392')] +[2023-11-22 03:57:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2548.6). Total num frames: 229376. Throughput: 0: 730.8. Samples: 57256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:57:16,204][05156] Avg episode reward: [(0, '4.354')] +[2023-11-22 03:57:19,550][06891] Updated weights for policy 0, policy_version 60 (0.0020) +[2023-11-22 03:57:21,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2630.1). Total num frames: 249856. Throughput: 0: 753.2. Samples: 63118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:57:21,202][05156] Avg episode reward: [(0, '4.367')] +[2023-11-22 03:57:21,213][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000061_249856.pth... +[2023-11-22 03:57:26,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2662.4). Total num frames: 266240. Throughput: 0: 775.2. Samples: 66010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:57:26,198][05156] Avg episode reward: [(0, '4.447')] +[2023-11-22 03:57:31,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2652.6). Total num frames: 278528. Throughput: 0: 778.6. Samples: 69920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:57:31,199][05156] Avg episode reward: [(0, '4.333')] +[2023-11-22 03:57:33,633][06891] Updated weights for policy 0, policy_version 70 (0.0013) +[2023-11-22 03:57:36,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2935.5, 300 sec: 2643.7). Total num frames: 290816. Throughput: 0: 759.4. Samples: 73634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 03:57:36,200][05156] Avg episode reward: [(0, '4.436')] +[2023-11-22 03:57:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2671.3). Total num frames: 307200. Throughput: 0: 734.9. Samples: 75534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 03:57:41,200][05156] Avg episode reward: [(0, '4.325')] +[2023-11-22 03:57:46,195][05156] Fps is (10 sec: 3277.3, 60 sec: 3072.0, 300 sec: 2696.5). Total num frames: 323584. Throughput: 0: 735.2. Samples: 80958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:57:46,203][05156] Avg episode reward: [(0, '4.200')] +[2023-11-22 03:57:46,211][06891] Updated weights for policy 0, policy_version 80 (0.0018) +[2023-11-22 03:57:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2752.5). Total num frames: 344064. Throughput: 0: 780.4. Samples: 86702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:57:51,205][05156] Avg episode reward: [(0, '4.222')] +[2023-11-22 03:57:56,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2741.2). Total num frames: 356352. Throughput: 0: 779.7. Samples: 88528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:57:56,198][05156] Avg episode reward: [(0, '4.314')] +[2023-11-22 03:58:00,086][06891] Updated weights for policy 0, policy_version 90 (0.0017) +[2023-11-22 03:58:01,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2935.5, 300 sec: 2730.7). Total num frames: 368640. Throughput: 0: 780.6. Samples: 92382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:58:01,204][05156] Avg episode reward: [(0, '4.416')] +[2023-11-22 03:58:06,195][05156] Fps is (10 sec: 2457.7, 60 sec: 2935.5, 300 sec: 2720.9). Total num frames: 380928. Throughput: 0: 733.6. Samples: 96132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:58:06,203][05156] Avg episode reward: [(0, '4.405')] +[2023-11-22 03:58:11,195][05156] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 2768.3). Total num frames: 401408. Throughput: 0: 733.9. Samples: 99036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:11,202][05156] Avg episode reward: [(0, '4.574')] +[2023-11-22 03:58:11,213][06878] Saving new best policy, reward=4.574! +[2023-11-22 03:58:12,693][06891] Updated weights for policy 0, policy_version 100 (0.0019) +[2023-11-22 03:58:16,198][05156] Fps is (10 sec: 3685.4, 60 sec: 3140.1, 300 sec: 2785.2). Total num frames: 417792. Throughput: 0: 777.8. Samples: 104922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:16,200][05156] Avg episode reward: [(0, '4.580')] +[2023-11-22 03:58:16,204][06878] Saving new best policy, reward=4.580! +[2023-11-22 03:58:21,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2774.7). Total num frames: 430080. Throughput: 0: 766.0. Samples: 108102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:58:21,198][05156] Avg episode reward: [(0, '4.412')] +[2023-11-22 03:58:26,195][05156] Fps is (10 sec: 2048.5, 60 sec: 2867.2, 300 sec: 2739.2). Total num frames: 438272. Throughput: 0: 757.4. Samples: 109618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:26,204][05156] Avg episode reward: [(0, '4.404')] +[2023-11-22 03:58:31,195][05156] Fps is (10 sec: 1638.4, 60 sec: 2798.9, 300 sec: 2705.8). Total num frames: 446464. Throughput: 0: 698.1. Samples: 112374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:31,209][05156] Avg episode reward: [(0, '4.585')] +[2023-11-22 03:58:31,227][06878] Saving new best policy, reward=4.585! +[2023-11-22 03:58:31,258][06891] Updated weights for policy 0, policy_version 110 (0.0015) +[2023-11-22 03:58:36,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2799.0, 300 sec: 2698.5). Total num frames: 458752. Throughput: 0: 633.6. Samples: 115214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 03:58:36,198][05156] Avg episode reward: [(0, '4.479')] +[2023-11-22 03:58:41,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2668.2). Total num frames: 466944. Throughput: 0: 618.7. Samples: 116368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:41,202][05156] Avg episode reward: [(0, '4.517')] +[2023-11-22 03:58:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2685.2). Total num frames: 483328. Throughput: 0: 629.8. Samples: 120724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:46,198][05156] Avg episode reward: [(0, '4.517')] +[2023-11-22 03:58:47,350][06891] Updated weights for policy 0, policy_version 120 (0.0029) +[2023-11-22 03:58:51,196][05156] Fps is (10 sec: 3686.2, 60 sec: 2662.4, 300 sec: 2723.3). Total num frames: 503808. Throughput: 0: 673.1. Samples: 126420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:58:51,202][05156] Avg episode reward: [(0, '4.571')] +[2023-11-22 03:58:56,197][05156] Fps is (10 sec: 3276.3, 60 sec: 2662.3, 300 sec: 2716.3). Total num frames: 516096. Throughput: 0: 646.8. Samples: 128144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:58:56,202][05156] Avg episode reward: [(0, '4.518')] +[2023-11-22 03:59:01,199][05156] Fps is (10 sec: 2456.9, 60 sec: 2662.3, 300 sec: 2709.6). Total num frames: 528384. Throughput: 0: 598.7. Samples: 131864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:59:01,201][05156] Avg episode reward: [(0, '4.593')] +[2023-11-22 03:59:01,221][06878] Saving new best policy, reward=4.593! +[2023-11-22 03:59:02,667][06891] Updated weights for policy 0, policy_version 130 (0.0019) +[2023-11-22 03:59:06,198][05156] Fps is (10 sec: 2457.4, 60 sec: 2662.3, 300 sec: 2703.3). Total num frames: 540672. Throughput: 0: 608.9. Samples: 135502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:59:06,205][05156] Avg episode reward: [(0, '4.560')] +[2023-11-22 03:59:11,195][05156] Fps is (10 sec: 2868.2, 60 sec: 2594.1, 300 sec: 2717.3). Total num frames: 557056. Throughput: 0: 632.2. Samples: 138068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:59:11,199][05156] Avg episode reward: [(0, '4.519')] +[2023-11-22 03:59:14,855][06891] Updated weights for policy 0, policy_version 140 (0.0027) +[2023-11-22 03:59:16,195][05156] Fps is (10 sec: 3687.3, 60 sec: 2662.5, 300 sec: 2750.2). Total num frames: 577536. Throughput: 0: 699.9. Samples: 143870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:59:16,198][05156] Avg episode reward: [(0, '4.445')] +[2023-11-22 03:59:21,196][05156] Fps is (10 sec: 3276.6, 60 sec: 2662.4, 300 sec: 2743.4). Total num frames: 589824. Throughput: 0: 731.9. Samples: 148150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:59:21,198][05156] Avg episode reward: [(0, '4.493')] +[2023-11-22 03:59:21,214][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000144_589824.pth... +[2023-11-22 03:59:26,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2730.6, 300 sec: 2736.9). Total num frames: 602112. Throughput: 0: 744.5. Samples: 149872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-11-22 03:59:26,199][05156] Avg episode reward: [(0, '4.466')] +[2023-11-22 03:59:31,000][06891] Updated weights for policy 0, policy_version 150 (0.0015) +[2023-11-22 03:59:31,195][05156] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2730.7). Total num frames: 614400. Throughput: 0: 728.8. Samples: 153522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 03:59:31,202][05156] Avg episode reward: [(0, '4.427')] +[2023-11-22 03:59:36,195][05156] Fps is (10 sec: 2867.3, 60 sec: 2867.2, 300 sec: 2742.5). Total num frames: 630784. Throughput: 0: 707.8. Samples: 158270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 03:59:36,206][05156] Avg episode reward: [(0, '4.415')] +[2023-11-22 03:59:41,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2771.3). Total num frames: 651264. Throughput: 0: 733.3. Samples: 161142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 03:59:41,198][05156] Avg episode reward: [(0, '4.517')] +[2023-11-22 03:59:41,988][06891] Updated weights for policy 0, policy_version 160 (0.0017) +[2023-11-22 03:59:46,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2764.8). Total num frames: 663552. Throughput: 0: 761.8. Samples: 166142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:59:46,198][05156] Avg episode reward: [(0, '4.493')] +[2023-11-22 03:59:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2758.5). Total num frames: 675840. Throughput: 0: 762.3. Samples: 169804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 03:59:51,202][05156] Avg episode reward: [(0, '4.483')] +[2023-11-22 03:59:56,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 2752.5). Total num frames: 688128. Throughput: 0: 743.9. Samples: 171542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 03:59:56,201][05156] Avg episode reward: [(0, '4.430')] +[2023-11-22 03:59:58,550][06891] Updated weights for policy 0, policy_version 170 (0.0029) +[2023-11-22 04:00:01,210][05156] Fps is (10 sec: 2863.1, 60 sec: 2934.9, 300 sec: 2762.6). Total num frames: 704512. Throughput: 0: 705.2. Samples: 175614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:00:01,212][05156] Avg episode reward: [(0, '4.329')] +[2023-11-22 04:00:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.9, 300 sec: 2772.7). Total num frames: 720896. Throughput: 0: 735.5. Samples: 181246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:00:06,202][05156] Avg episode reward: [(0, '4.425')] +[2023-11-22 04:00:10,164][06891] Updated weights for policy 0, policy_version 180 (0.0047) +[2023-11-22 04:00:11,195][05156] Fps is (10 sec: 3281.5, 60 sec: 3003.7, 300 sec: 2782.2). Total num frames: 737280. Throughput: 0: 756.3. Samples: 183906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:00:11,200][05156] Avg episode reward: [(0, '4.550')] +[2023-11-22 04:00:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2776.2). Total num frames: 749568. Throughput: 0: 755.9. Samples: 187536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:00:16,201][05156] Avg episode reward: [(0, '4.637')] +[2023-11-22 04:00:16,206][06878] Saving new best policy, reward=4.637! +[2023-11-22 04:00:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2770.4). Total num frames: 761856. Throughput: 0: 731.5. Samples: 191188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:00:21,201][05156] Avg episode reward: [(0, '4.630')] +[2023-11-22 04:00:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2764.8). Total num frames: 774144. Throughput: 0: 707.4. Samples: 192976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:00:26,197][05156] Avg episode reward: [(0, '4.739')] +[2023-11-22 04:00:26,206][06878] Saving new best policy, reward=4.739! +[2023-11-22 04:00:26,499][06891] Updated weights for policy 0, policy_version 190 (0.0022) +[2023-11-22 04:00:31,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2788.2). Total num frames: 794624. Throughput: 0: 714.7. Samples: 198304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:00:31,204][05156] Avg episode reward: [(0, '4.658')] +[2023-11-22 04:00:36,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2796.6). Total num frames: 811008. Throughput: 0: 748.1. Samples: 203470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:00:36,202][05156] Avg episode reward: [(0, '4.511')] +[2023-11-22 04:00:38,687][06891] Updated weights for policy 0, policy_version 200 (0.0027) +[2023-11-22 04:00:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2790.8). Total num frames: 823296. Throughput: 0: 748.0. Samples: 205202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:00:41,198][05156] Avg episode reward: [(0, '4.491')] +[2023-11-22 04:00:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2832.5). Total num frames: 835584. Throughput: 0: 737.9. Samples: 208810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:00:46,200][05156] Avg episode reward: [(0, '4.425')] +[2023-11-22 04:00:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2874.1). Total num frames: 847872. Throughput: 0: 692.2. Samples: 212394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:00:51,203][05156] Avg episode reward: [(0, '4.399')] +[2023-11-22 04:00:54,114][06891] Updated weights for policy 0, policy_version 210 (0.0025) +[2023-11-22 04:00:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 864256. Throughput: 0: 693.6. Samples: 215116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:00:56,198][05156] Avg episode reward: [(0, '4.581')] +[2023-11-22 04:01:01,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3004.4, 300 sec: 2915.8). Total num frames: 884736. Throughput: 0: 740.0. Samples: 220836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:01:01,198][05156] Avg episode reward: [(0, '4.693')] +[2023-11-22 04:01:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 897024. Throughput: 0: 742.7. Samples: 224610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:01:06,203][05156] Avg episode reward: [(0, '4.592')] +[2023-11-22 04:01:07,697][06891] Updated weights for policy 0, policy_version 220 (0.0036) +[2023-11-22 04:01:11,199][05156] Fps is (10 sec: 2456.8, 60 sec: 2867.0, 300 sec: 2915.8). Total num frames: 909312. Throughput: 0: 743.5. Samples: 226434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:01:11,206][05156] Avg episode reward: [(0, '4.627')] +[2023-11-22 04:01:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 921600. Throughput: 0: 704.2. Samples: 229994. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-11-22 04:01:16,201][05156] Avg episode reward: [(0, '4.506')] +[2023-11-22 04:01:21,195][05156] Fps is (10 sec: 2868.2, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 937984. Throughput: 0: 702.2. Samples: 235068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:01:21,204][05156] Avg episode reward: [(0, '4.514')] +[2023-11-22 04:01:21,219][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000229_937984.pth... +[2023-11-22 04:01:21,371][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000061_249856.pth +[2023-11-22 04:01:21,699][06891] Updated weights for policy 0, policy_version 230 (0.0031) +[2023-11-22 04:01:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 954368. Throughput: 0: 721.3. Samples: 237662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:01:26,198][05156] Avg episode reward: [(0, '4.429')] +[2023-11-22 04:01:31,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 966656. Throughput: 0: 740.8. Samples: 242148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:01:31,202][05156] Avg episode reward: [(0, '4.425')] +[2023-11-22 04:01:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 978944. Throughput: 0: 737.6. Samples: 245586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:01:36,198][05156] Avg episode reward: [(0, '4.429')] +[2023-11-22 04:01:36,968][06891] Updated weights for policy 0, policy_version 240 (0.0014) +[2023-11-22 04:01:41,196][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 991232. Throughput: 0: 716.3. Samples: 247348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:01:41,199][05156] Avg episode reward: [(0, '4.428')] +[2023-11-22 04:01:46,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1007616. Throughput: 0: 680.4. Samples: 251452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:01:46,202][05156] Avg episode reward: [(0, '4.508')] +[2023-11-22 04:01:50,172][06891] Updated weights for policy 0, policy_version 250 (0.0038) +[2023-11-22 04:01:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2888.0). Total num frames: 1028096. Throughput: 0: 720.8. Samples: 257046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:01:51,203][05156] Avg episode reward: [(0, '4.684')] +[2023-11-22 04:01:56,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 1040384. Throughput: 0: 738.1. Samples: 259646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:01:56,198][05156] Avg episode reward: [(0, '4.785')] +[2023-11-22 04:01:56,200][06878] Saving new best policy, reward=4.785! +[2023-11-22 04:02:01,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 1052672. Throughput: 0: 740.0. Samples: 263292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-11-22 04:02:01,199][05156] Avg episode reward: [(0, '4.928')] +[2023-11-22 04:02:01,214][06878] Saving new best policy, reward=4.928! +[2023-11-22 04:02:05,517][06891] Updated weights for policy 0, policy_version 260 (0.0015) +[2023-11-22 04:02:06,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 1064960. Throughput: 0: 705.3. Samples: 266806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-11-22 04:02:06,200][05156] Avg episode reward: [(0, '4.729')] +[2023-11-22 04:02:11,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2874.1). Total num frames: 1077248. Throughput: 0: 685.5. Samples: 268508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:02:11,202][05156] Avg episode reward: [(0, '4.739')] +[2023-11-22 04:02:16,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 1097728. Throughput: 0: 710.1. Samples: 274102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:02:16,198][05156] Avg episode reward: [(0, '4.616')] +[2023-11-22 04:02:17,819][06891] Updated weights for policy 0, policy_version 270 (0.0031) +[2023-11-22 04:02:21,195][05156] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 1114112. Throughput: 0: 746.7. Samples: 279188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:02:21,202][05156] Avg episode reward: [(0, '4.669')] +[2023-11-22 04:02:26,201][05156] Fps is (10 sec: 2865.7, 60 sec: 2866.9, 300 sec: 2874.1). Total num frames: 1126400. Throughput: 0: 747.3. Samples: 280980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:02:26,208][05156] Avg episode reward: [(0, '4.663')] +[2023-11-22 04:02:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2874.2). Total num frames: 1138688. Throughput: 0: 736.4. Samples: 284588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:02:31,209][05156] Avg episode reward: [(0, '4.670')] +[2023-11-22 04:02:34,420][06891] Updated weights for policy 0, policy_version 280 (0.0026) +[2023-11-22 04:02:36,195][05156] Fps is (10 sec: 2458.9, 60 sec: 2867.2, 300 sec: 2860.3). Total num frames: 1150976. Throughput: 0: 694.1. Samples: 288282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:02:36,198][05156] Avg episode reward: [(0, '4.829')] +[2023-11-22 04:02:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2860.3). Total num frames: 1167360. Throughput: 0: 698.2. Samples: 291064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:02:41,203][05156] Avg episode reward: [(0, '5.031')] +[2023-11-22 04:02:41,214][06878] Saving new best policy, reward=5.031! +[2023-11-22 04:02:46,103][06891] Updated weights for policy 0, policy_version 290 (0.0024) +[2023-11-22 04:02:46,196][05156] Fps is (10 sec: 3686.2, 60 sec: 3003.7, 300 sec: 2860.3). Total num frames: 1187840. Throughput: 0: 740.5. Samples: 296616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:02:46,198][05156] Avg episode reward: [(0, '5.181')] +[2023-11-22 04:02:46,203][06878] Saving new best policy, reward=5.181! +[2023-11-22 04:02:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2860.3). Total num frames: 1200128. Throughput: 0: 743.6. Samples: 300270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:02:51,198][05156] Avg episode reward: [(0, '5.253')] +[2023-11-22 04:02:51,211][06878] Saving new best policy, reward=5.253! +[2023-11-22 04:02:56,195][05156] Fps is (10 sec: 2048.1, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 1208320. Throughput: 0: 743.9. Samples: 301984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:02:56,202][05156] Avg episode reward: [(0, '5.151')] +[2023-11-22 04:03:01,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 1220608. Throughput: 0: 700.6. Samples: 305628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:03:01,204][05156] Avg episode reward: [(0, '5.032')] +[2023-11-22 04:03:02,474][06891] Updated weights for policy 0, policy_version 300 (0.0025) +[2023-11-22 04:03:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2846.4). Total num frames: 1241088. Throughput: 0: 705.3. Samples: 310928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:03:06,200][05156] Avg episode reward: [(0, '5.244')] +[2023-11-22 04:03:11,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2846.4). Total num frames: 1257472. Throughput: 0: 729.2. Samples: 313790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:11,198][05156] Avg episode reward: [(0, '5.407')] +[2023-11-22 04:03:11,238][06878] Saving new best policy, reward=5.407! +[2023-11-22 04:03:14,739][06891] Updated weights for policy 0, policy_version 310 (0.0044) +[2023-11-22 04:03:16,197][05156] Fps is (10 sec: 2866.7, 60 sec: 2867.1, 300 sec: 2846.4). Total num frames: 1269760. Throughput: 0: 743.8. Samples: 318060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:16,207][05156] Avg episode reward: [(0, '5.536')] +[2023-11-22 04:03:16,209][06878] Saving new best policy, reward=5.536! +[2023-11-22 04:03:21,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2798.9, 300 sec: 2860.3). Total num frames: 1282048. Throughput: 0: 743.0. Samples: 321718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:21,202][05156] Avg episode reward: [(0, '5.291')] +[2023-11-22 04:03:21,218][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000313_1282048.pth... +[2023-11-22 04:03:21,358][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000144_589824.pth +[2023-11-22 04:03:26,195][05156] Fps is (10 sec: 2458.0, 60 sec: 2799.2, 300 sec: 2874.1). Total num frames: 1294336. Throughput: 0: 719.4. Samples: 323436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:03:26,203][05156] Avg episode reward: [(0, '5.275')] +[2023-11-22 04:03:29,798][06891] Updated weights for policy 0, policy_version 320 (0.0037) +[2023-11-22 04:03:31,195][05156] Fps is (10 sec: 3277.0, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1314816. Throughput: 0: 697.7. Samples: 328012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:03:31,198][05156] Avg episode reward: [(0, '5.393')] +[2023-11-22 04:03:36,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1331200. Throughput: 0: 740.5. Samples: 333592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:36,201][05156] Avg episode reward: [(0, '5.568')] +[2023-11-22 04:03:36,204][06878] Saving new best policy, reward=5.568! +[2023-11-22 04:03:41,196][05156] Fps is (10 sec: 2867.1, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1343488. Throughput: 0: 751.8. Samples: 335816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:41,198][05156] Avg episode reward: [(0, '5.622')] +[2023-11-22 04:03:41,218][06878] Saving new best policy, reward=5.622! +[2023-11-22 04:03:43,283][06891] Updated weights for policy 0, policy_version 330 (0.0014) +[2023-11-22 04:03:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2888.0). Total num frames: 1355776. Throughput: 0: 751.2. Samples: 339432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:46,198][05156] Avg episode reward: [(0, '5.441')] +[2023-11-22 04:03:51,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 1368064. Throughput: 0: 714.4. Samples: 343078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:51,201][05156] Avg episode reward: [(0, '5.486')] +[2023-11-22 04:03:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1384448. Throughput: 0: 695.4. Samples: 345082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:03:56,200][05156] Avg episode reward: [(0, '5.599')] +[2023-11-22 04:03:57,635][06891] Updated weights for policy 0, policy_version 340 (0.0020) +[2023-11-22 04:04:01,195][05156] Fps is (10 sec: 3687.0, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 1404928. Throughput: 0: 727.7. Samples: 350806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:04:01,198][05156] Avg episode reward: [(0, '5.884')] +[2023-11-22 04:04:01,214][06878] Saving new best policy, reward=5.884! +[2023-11-22 04:04:06,196][05156] Fps is (10 sec: 3276.6, 60 sec: 2935.4, 300 sec: 2915.8). Total num frames: 1417216. Throughput: 0: 749.8. Samples: 355460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:04:06,202][05156] Avg episode reward: [(0, '6.037')] +[2023-11-22 04:04:06,207][06878] Saving new best policy, reward=6.037! +[2023-11-22 04:04:11,201][05156] Fps is (10 sec: 2456.3, 60 sec: 2867.0, 300 sec: 2888.0). Total num frames: 1429504. Throughput: 0: 750.8. Samples: 357226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:04:11,203][05156] Avg episode reward: [(0, '6.006')] +[2023-11-22 04:04:11,768][06891] Updated weights for policy 0, policy_version 350 (0.0022) +[2023-11-22 04:04:16,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1441792. Throughput: 0: 729.8. Samples: 360854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:04:16,200][05156] Avg episode reward: [(0, '5.728')] +[2023-11-22 04:04:21,195][05156] Fps is (10 sec: 2868.7, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1458176. Throughput: 0: 699.9. Samples: 365086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:04:21,202][05156] Avg episode reward: [(0, '5.719')] +[2023-11-22 04:04:25,160][06891] Updated weights for policy 0, policy_version 360 (0.0019) +[2023-11-22 04:04:26,195][05156] Fps is (10 sec: 3687.1, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 1478656. Throughput: 0: 715.1. Samples: 367994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:04:26,198][05156] Avg episode reward: [(0, '5.926')] +[2023-11-22 04:04:31,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1490944. Throughput: 0: 757.0. Samples: 373496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:04:31,198][05156] Avg episode reward: [(0, '6.494')] +[2023-11-22 04:04:31,216][06878] Saving new best policy, reward=6.494! +[2023-11-22 04:04:36,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1503232. Throughput: 0: 754.1. Samples: 377010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:04:36,201][05156] Avg episode reward: [(0, '6.457')] +[2023-11-22 04:04:40,131][06891] Updated weights for policy 0, policy_version 370 (0.0027) +[2023-11-22 04:04:41,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1515520. Throughput: 0: 749.7. Samples: 378818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:04:41,199][05156] Avg episode reward: [(0, '6.776')] +[2023-11-22 04:04:41,219][06878] Saving new best policy, reward=6.776! +[2023-11-22 04:04:46,195][05156] Fps is (10 sec: 2457.7, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1527808. Throughput: 0: 704.2. Samples: 382496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-11-22 04:04:46,197][05156] Avg episode reward: [(0, '6.272')] +[2023-11-22 04:04:51,197][05156] Fps is (10 sec: 3276.3, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 1548288. Throughput: 0: 724.4. Samples: 388058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:04:51,202][05156] Avg episode reward: [(0, '6.264')] +[2023-11-22 04:04:52,281][06891] Updated weights for policy 0, policy_version 380 (0.0035) +[2023-11-22 04:04:56,199][05156] Fps is (10 sec: 4094.5, 60 sec: 3071.8, 300 sec: 2929.8). Total num frames: 1568768. Throughput: 0: 749.0. Samples: 390932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:04:56,202][05156] Avg episode reward: [(0, '6.210')] +[2023-11-22 04:05:01,195][05156] Fps is (10 sec: 3277.3, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1581056. Throughput: 0: 764.5. Samples: 395256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:05:01,202][05156] Avg episode reward: [(0, '6.588')] +[2023-11-22 04:05:06,195][05156] Fps is (10 sec: 2458.5, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1593344. Throughput: 0: 752.4. Samples: 398942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:05:06,197][05156] Avg episode reward: [(0, '6.608')] +[2023-11-22 04:05:07,122][06891] Updated weights for policy 0, policy_version 390 (0.0015) +[2023-11-22 04:05:11,196][05156] Fps is (10 sec: 2457.4, 60 sec: 2935.7, 300 sec: 2901.9). Total num frames: 1605632. Throughput: 0: 728.6. Samples: 400780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:05:11,198][05156] Avg episode reward: [(0, '6.683')] +[2023-11-22 04:05:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2915.8). Total num frames: 1622016. Throughput: 0: 712.6. Samples: 405564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-11-22 04:05:16,198][05156] Avg episode reward: [(0, '6.637')] +[2023-11-22 04:05:19,490][06891] Updated weights for policy 0, policy_version 400 (0.0022) +[2023-11-22 04:05:21,198][05156] Fps is (10 sec: 3685.6, 60 sec: 3071.9, 300 sec: 2943.5). Total num frames: 1642496. Throughput: 0: 761.8. Samples: 411292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:05:21,202][05156] Avg episode reward: [(0, '6.839')] +[2023-11-22 04:05:21,223][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000401_1642496.pth... +[2023-11-22 04:05:21,365][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000229_937984.pth +[2023-11-22 04:05:21,386][06878] Saving new best policy, reward=6.839! +[2023-11-22 04:05:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1654784. Throughput: 0: 763.2. Samples: 413162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-11-22 04:05:26,198][05156] Avg episode reward: [(0, '6.863')] +[2023-11-22 04:05:26,203][06878] Saving new best policy, reward=6.863! +[2023-11-22 04:05:31,205][05156] Fps is (10 sec: 2455.8, 60 sec: 2935.0, 300 sec: 2901.8). Total num frames: 1667072. Throughput: 0: 762.4. Samples: 416812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:05:31,222][05156] Avg episode reward: [(0, '6.658')] +[2023-11-22 04:05:36,107][06891] Updated weights for policy 0, policy_version 410 (0.0038) +[2023-11-22 04:05:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1679360. Throughput: 0: 718.6. Samples: 420394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:05:36,198][05156] Avg episode reward: [(0, '7.107')] +[2023-11-22 04:05:36,200][06878] Saving new best policy, reward=7.107! +[2023-11-22 04:05:41,195][05156] Fps is (10 sec: 2870.1, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 1695744. Throughput: 0: 704.7. Samples: 422640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-11-22 04:05:41,199][05156] Avg episode reward: [(0, '7.421')] +[2023-11-22 04:05:41,211][06878] Saving new best policy, reward=7.421! +[2023-11-22 04:05:46,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 1712128. Throughput: 0: 734.1. Samples: 428292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:05:46,198][05156] Avg episode reward: [(0, '7.633')] +[2023-11-22 04:05:46,218][06878] Saving new best policy, reward=7.633! +[2023-11-22 04:05:47,399][06891] Updated weights for policy 0, policy_version 420 (0.0018) +[2023-11-22 04:05:51,201][05156] Fps is (10 sec: 3275.1, 60 sec: 3003.5, 300 sec: 2929.6). Total num frames: 1728512. Throughput: 0: 750.3. Samples: 432708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:05:51,208][05156] Avg episode reward: [(0, '7.561')] +[2023-11-22 04:05:56,196][05156] Fps is (10 sec: 2867.1, 60 sec: 2867.4, 300 sec: 2901.9). Total num frames: 1740800. Throughput: 0: 751.6. Samples: 434604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:05:56,202][05156] Avg episode reward: [(0, '7.104')] +[2023-11-22 04:06:01,197][05156] Fps is (10 sec: 2458.5, 60 sec: 2867.1, 300 sec: 2901.9). Total num frames: 1753088. Throughput: 0: 726.1. Samples: 438238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:06:01,201][05156] Avg episode reward: [(0, '7.215')] +[2023-11-22 04:06:03,638][06891] Updated weights for policy 0, policy_version 430 (0.0035) +[2023-11-22 04:06:06,195][05156] Fps is (10 sec: 2867.3, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1769472. Throughput: 0: 700.7. Samples: 442822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:06:06,200][05156] Avg episode reward: [(0, '7.219')] +[2023-11-22 04:06:11,195][05156] Fps is (10 sec: 3687.0, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 1789952. Throughput: 0: 724.5. Samples: 445766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:06:11,205][05156] Avg episode reward: [(0, '7.815')] +[2023-11-22 04:06:11,215][06878] Saving new best policy, reward=7.815! +[2023-11-22 04:06:15,213][06891] Updated weights for policy 0, policy_version 440 (0.0018) +[2023-11-22 04:06:16,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1802240. Throughput: 0: 756.7. Samples: 450856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:06:16,202][05156] Avg episode reward: [(0, '8.087')] +[2023-11-22 04:06:16,206][06878] Saving new best policy, reward=8.087! +[2023-11-22 04:06:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 2915.8). Total num frames: 1814528. Throughput: 0: 757.4. Samples: 454478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:06:21,200][05156] Avg episode reward: [(0, '9.065')] +[2023-11-22 04:06:21,213][06878] Saving new best policy, reward=9.065! +[2023-11-22 04:06:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 1826816. Throughput: 0: 748.5. Samples: 456324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:06:26,198][05156] Avg episode reward: [(0, '8.924')] +[2023-11-22 04:06:30,778][06891] Updated weights for policy 0, policy_version 450 (0.0037) +[2023-11-22 04:06:31,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2936.0, 300 sec: 2929.7). Total num frames: 1843200. Throughput: 0: 712.4. Samples: 460352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:06:31,198][05156] Avg episode reward: [(0, '9.232')] +[2023-11-22 04:06:31,221][06878] Saving new best policy, reward=9.232! +[2023-11-22 04:06:36,196][05156] Fps is (10 sec: 3686.3, 60 sec: 3072.0, 300 sec: 2957.4). Total num frames: 1863680. Throughput: 0: 743.7. Samples: 466170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:06:36,201][05156] Avg episode reward: [(0, '9.514')] +[2023-11-22 04:06:36,203][06878] Saving new best policy, reward=9.514! +[2023-11-22 04:06:41,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 1880064. Throughput: 0: 764.8. Samples: 469020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:06:41,199][05156] Avg episode reward: [(0, '9.404')] +[2023-11-22 04:06:42,597][06891] Updated weights for policy 0, policy_version 460 (0.0023) +[2023-11-22 04:06:46,196][05156] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1892352. Throughput: 0: 767.4. Samples: 472772. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:06:46,204][05156] Avg episode reward: [(0, '10.100')] +[2023-11-22 04:06:46,205][06878] Saving new best policy, reward=10.100! +[2023-11-22 04:06:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.7, 300 sec: 2929.7). Total num frames: 1904640. Throughput: 0: 749.1. Samples: 476532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:06:51,209][05156] Avg episode reward: [(0, '10.266')] +[2023-11-22 04:06:51,232][06878] Saving new best policy, reward=10.266! +[2023-11-22 04:06:56,195][05156] Fps is (10 sec: 2457.8, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1916928. Throughput: 0: 724.9. Samples: 478388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:06:56,198][05156] Avg episode reward: [(0, '10.107')] +[2023-11-22 04:06:57,698][06891] Updated weights for policy 0, policy_version 470 (0.0013) +[2023-11-22 04:07:01,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 2957.5). Total num frames: 1937408. Throughput: 0: 731.9. Samples: 483790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:07:01,198][05156] Avg episode reward: [(0, '10.392')] +[2023-11-22 04:07:01,208][06878] Saving new best policy, reward=10.392! +[2023-11-22 04:07:06,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 1953792. Throughput: 0: 770.5. Samples: 489150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:07:06,202][05156] Avg episode reward: [(0, '10.153')] +[2023-11-22 04:07:10,438][06891] Updated weights for policy 0, policy_version 480 (0.0014) +[2023-11-22 04:07:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1966080. Throughput: 0: 771.4. Samples: 491036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:07:11,200][05156] Avg episode reward: [(0, '10.279')] +[2023-11-22 04:07:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1978368. Throughput: 0: 766.2. Samples: 494832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:07:16,198][05156] Avg episode reward: [(0, '10.458')] +[2023-11-22 04:07:16,199][06878] Saving new best policy, reward=10.458! +[2023-11-22 04:07:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1990656. Throughput: 0: 717.7. Samples: 498468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:07:21,202][05156] Avg episode reward: [(0, '10.927')] +[2023-11-22 04:07:21,214][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000486_1990656.pth... +[2023-11-22 04:07:21,331][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000313_1282048.pth +[2023-11-22 04:07:21,343][06878] Saving new best policy, reward=10.927! +[2023-11-22 04:07:24,675][06891] Updated weights for policy 0, policy_version 490 (0.0014) +[2023-11-22 04:07:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 2011136. Throughput: 0: 717.5. Samples: 501308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:07:26,198][05156] Avg episode reward: [(0, '11.591')] +[2023-11-22 04:07:26,200][06878] Saving new best policy, reward=11.591! +[2023-11-22 04:07:31,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 2985.2). Total num frames: 2031616. Throughput: 0: 766.9. Samples: 507280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:07:31,203][05156] Avg episode reward: [(0, '12.051')] +[2023-11-22 04:07:31,217][06878] Saving new best policy, reward=12.051! +[2023-11-22 04:07:36,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2043904. Throughput: 0: 771.2. Samples: 511234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:07:36,205][05156] Avg episode reward: [(0, '12.286')] +[2023-11-22 04:07:36,206][06878] Saving new best policy, reward=12.286! +[2023-11-22 04:07:38,097][06891] Updated weights for policy 0, policy_version 500 (0.0020) +[2023-11-22 04:07:41,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 2052096. Throughput: 0: 766.7. Samples: 512890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:07:41,201][05156] Avg episode reward: [(0, '11.796')] +[2023-11-22 04:07:46,196][05156] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 2064384. Throughput: 0: 729.3. Samples: 516608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:07:46,198][05156] Avg episode reward: [(0, '12.246')] +[2023-11-22 04:07:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2084864. Throughput: 0: 724.6. Samples: 521756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:07:51,198][05156] Avg episode reward: [(0, '13.190')] +[2023-11-22 04:07:51,206][06878] Saving new best policy, reward=13.190! +[2023-11-22 04:07:51,962][06891] Updated weights for policy 0, policy_version 510 (0.0020) +[2023-11-22 04:07:56,195][05156] Fps is (10 sec: 4096.1, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 2105344. Throughput: 0: 748.4. Samples: 524712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:07:56,198][05156] Avg episode reward: [(0, '13.524')] +[2023-11-22 04:07:56,200][06878] Saving new best policy, reward=13.524! +[2023-11-22 04:08:01,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2117632. Throughput: 0: 765.8. Samples: 529292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:08:01,197][05156] Avg episode reward: [(0, '14.561')] +[2023-11-22 04:08:01,210][06878] Saving new best policy, reward=14.561! +[2023-11-22 04:08:05,532][06891] Updated weights for policy 0, policy_version 520 (0.0023) +[2023-11-22 04:08:06,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 2129920. Throughput: 0: 767.2. Samples: 532992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:08:06,198][05156] Avg episode reward: [(0, '15.548')] +[2023-11-22 04:08:06,203][06878] Saving new best policy, reward=15.548! +[2023-11-22 04:08:11,197][05156] Fps is (10 sec: 2457.2, 60 sec: 2935.4, 300 sec: 2957.4). Total num frames: 2142208. Throughput: 0: 743.8. Samples: 534780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:08:11,203][05156] Avg episode reward: [(0, '15.024')] +[2023-11-22 04:08:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2158592. Throughput: 0: 711.7. Samples: 539308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:08:16,198][05156] Avg episode reward: [(0, '15.533')] +[2023-11-22 04:08:18,760][06891] Updated weights for policy 0, policy_version 530 (0.0023) +[2023-11-22 04:08:21,195][05156] Fps is (10 sec: 3687.0, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 2179072. Throughput: 0: 755.1. Samples: 545214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:08:21,201][05156] Avg episode reward: [(0, '15.017')] +[2023-11-22 04:08:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2191360. Throughput: 0: 769.8. Samples: 547532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:08:26,200][05156] Avg episode reward: [(0, '13.726')] +[2023-11-22 04:08:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2957.5). Total num frames: 2203648. Throughput: 0: 772.7. Samples: 551378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:08:31,203][05156] Avg episode reward: [(0, '13.754')] +[2023-11-22 04:08:33,140][06891] Updated weights for policy 0, policy_version 540 (0.0021) +[2023-11-22 04:08:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2957.5). Total num frames: 2215936. Throughput: 0: 739.6. Samples: 555036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:08:36,201][05156] Avg episode reward: [(0, '13.851')] +[2023-11-22 04:08:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2232320. Throughput: 0: 720.0. Samples: 557112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:08:41,202][05156] Avg episode reward: [(0, '12.784')] +[2023-11-22 04:08:45,470][06891] Updated weights for policy 0, policy_version 550 (0.0019) +[2023-11-22 04:08:46,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 2252800. Throughput: 0: 751.0. Samples: 563086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:08:46,205][05156] Avg episode reward: [(0, '13.759')] +[2023-11-22 04:08:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 2269184. Throughput: 0: 780.8. Samples: 568128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:08:51,202][05156] Avg episode reward: [(0, '14.106')] +[2023-11-22 04:08:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 2281472. Throughput: 0: 783.4. Samples: 570032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:08:56,202][05156] Avg episode reward: [(0, '14.623')] +[2023-11-22 04:09:00,313][06891] Updated weights for policy 0, policy_version 560 (0.0013) +[2023-11-22 04:09:01,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 2293760. Throughput: 0: 765.6. Samples: 573762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:09:01,204][05156] Avg episode reward: [(0, '15.614')] +[2023-11-22 04:09:01,218][06878] Saving new best policy, reward=15.614! +[2023-11-22 04:09:06,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2985.3). Total num frames: 2310144. Throughput: 0: 735.3. Samples: 578304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:09:06,202][05156] Avg episode reward: [(0, '15.894')] +[2023-11-22 04:09:06,205][06878] Saving new best policy, reward=15.894! +[2023-11-22 04:09:11,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 3013.0). Total num frames: 2330624. Throughput: 0: 747.2. Samples: 581154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:09:11,201][05156] Avg episode reward: [(0, '15.337')] +[2023-11-22 04:09:11,881][06891] Updated weights for policy 0, policy_version 570 (0.0033) +[2023-11-22 04:09:16,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2347008. Throughput: 0: 781.3. Samples: 586538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:09:16,198][05156] Avg episode reward: [(0, '14.867')] +[2023-11-22 04:09:21,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2359296. Throughput: 0: 786.9. Samples: 590446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:09:21,203][05156] Avg episode reward: [(0, '13.953')] +[2023-11-22 04:09:21,212][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth... +[2023-11-22 04:09:21,359][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000401_1642496.pth +[2023-11-22 04:09:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2371584. Throughput: 0: 782.6. Samples: 592330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:09:26,198][05156] Avg episode reward: [(0, '14.771')] +[2023-11-22 04:09:27,271][06891] Updated weights for policy 0, policy_version 580 (0.0018) +[2023-11-22 04:09:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2383872. Throughput: 0: 731.3. Samples: 595994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:09:31,201][05156] Avg episode reward: [(0, '14.867')] +[2023-11-22 04:09:36,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2404352. Throughput: 0: 751.5. Samples: 601944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:09:36,198][05156] Avg episode reward: [(0, '15.682')] +[2023-11-22 04:09:38,567][06891] Updated weights for policy 0, policy_version 590 (0.0030) +[2023-11-22 04:09:41,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 2424832. Throughput: 0: 776.5. Samples: 604974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:09:41,205][05156] Avg episode reward: [(0, '15.999')] +[2023-11-22 04:09:41,219][06878] Saving new best policy, reward=15.999! +[2023-11-22 04:09:46,199][05156] Fps is (10 sec: 3275.6, 60 sec: 3071.8, 300 sec: 3013.0). Total num frames: 2437120. Throughput: 0: 784.0. Samples: 609046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:09:46,201][05156] Avg episode reward: [(0, '16.978')] +[2023-11-22 04:09:46,204][06878] Saving new best policy, reward=16.978! +[2023-11-22 04:09:51,199][05156] Fps is (10 sec: 2456.7, 60 sec: 3003.6, 300 sec: 2985.2). Total num frames: 2449408. Throughput: 0: 768.7. Samples: 612896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:09:51,201][05156] Avg episode reward: [(0, '16.994')] +[2023-11-22 04:09:51,218][06878] Saving new best policy, reward=16.994! +[2023-11-22 04:09:54,168][06891] Updated weights for policy 0, policy_version 600 (0.0019) +[2023-11-22 04:09:56,195][05156] Fps is (10 sec: 2458.5, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2461696. Throughput: 0: 744.5. Samples: 614658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:09:56,198][05156] Avg episode reward: [(0, '16.785')] +[2023-11-22 04:10:01,195][05156] Fps is (10 sec: 3278.0, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2482176. Throughput: 0: 740.8. Samples: 619874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:10:01,198][05156] Avg episode reward: [(0, '18.278')] +[2023-11-22 04:10:01,215][06878] Saving new best policy, reward=18.278! +[2023-11-22 04:10:05,177][06891] Updated weights for policy 0, policy_version 610 (0.0021) +[2023-11-22 04:10:06,197][05156] Fps is (10 sec: 3685.8, 60 sec: 3140.2, 300 sec: 3026.9). Total num frames: 2498560. Throughput: 0: 783.5. Samples: 625706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:10:06,203][05156] Avg episode reward: [(0, '18.083')] +[2023-11-22 04:10:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 2510848. Throughput: 0: 785.1. Samples: 627658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:10:11,204][05156] Avg episode reward: [(0, '18.486')] +[2023-11-22 04:10:11,217][06878] Saving new best policy, reward=18.486! +[2023-11-22 04:10:16,195][05156] Fps is (10 sec: 2458.0, 60 sec: 2935.5, 300 sec: 2985.2). Total num frames: 2523136. Throughput: 0: 787.0. Samples: 631410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:10:16,203][05156] Avg episode reward: [(0, '18.291')] +[2023-11-22 04:10:20,878][06891] Updated weights for policy 0, policy_version 620 (0.0016) +[2023-11-22 04:10:21,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 2539520. Throughput: 0: 741.2. Samples: 635300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:10:21,199][05156] Avg episode reward: [(0, '18.189')] +[2023-11-22 04:10:26,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3027.0). Total num frames: 2560000. Throughput: 0: 737.7. Samples: 638172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:10:26,197][05156] Avg episode reward: [(0, '17.159')] +[2023-11-22 04:10:31,038][06891] Updated weights for policy 0, policy_version 630 (0.0020) +[2023-11-22 04:10:31,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3054.6). Total num frames: 2580480. Throughput: 0: 784.8. Samples: 644358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:10:31,203][05156] Avg episode reward: [(0, '16.638')] +[2023-11-22 04:10:36,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 2592768. Throughput: 0: 798.3. Samples: 648816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:10:36,202][05156] Avg episode reward: [(0, '17.171')] +[2023-11-22 04:10:41,197][05156] Fps is (10 sec: 2457.2, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 2605056. Throughput: 0: 801.1. Samples: 650710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:10:41,202][05156] Avg episode reward: [(0, '17.343')] +[2023-11-22 04:10:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.9, 300 sec: 3013.0). Total num frames: 2617344. Throughput: 0: 768.9. Samples: 654474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:10:46,200][05156] Avg episode reward: [(0, '18.108')] +[2023-11-22 04:10:46,970][06891] Updated weights for policy 0, policy_version 640 (0.0035) +[2023-11-22 04:10:51,195][05156] Fps is (10 sec: 2867.6, 60 sec: 3072.2, 300 sec: 3026.9). Total num frames: 2633728. Throughput: 0: 744.9. Samples: 659224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:10:51,198][05156] Avg episode reward: [(0, '17.344')] +[2023-11-22 04:10:56,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3054.7). Total num frames: 2654208. Throughput: 0: 766.0. Samples: 662126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:10:56,198][05156] Avg episode reward: [(0, '18.163')] +[2023-11-22 04:10:57,688][06891] Updated weights for policy 0, policy_version 650 (0.0018) +[2023-11-22 04:11:01,200][05156] Fps is (10 sec: 3685.3, 60 sec: 3140.1, 300 sec: 3054.6). Total num frames: 2670592. Throughput: 0: 801.9. Samples: 667496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:11:01,205][05156] Avg episode reward: [(0, '18.739')] +[2023-11-22 04:11:01,222][06878] Saving new best policy, reward=18.739! +[2023-11-22 04:11:06,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.1, 300 sec: 3026.9). Total num frames: 2682880. Throughput: 0: 798.7. Samples: 671240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:11:06,201][05156] Avg episode reward: [(0, '18.660')] +[2023-11-22 04:11:11,195][05156] Fps is (10 sec: 2458.4, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 2695168. Throughput: 0: 775.1. Samples: 673050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:11:11,198][05156] Avg episode reward: [(0, '20.152')] +[2023-11-22 04:11:11,206][06878] Saving new best policy, reward=20.152! +[2023-11-22 04:11:14,164][06891] Updated weights for policy 0, policy_version 660 (0.0020) +[2023-11-22 04:11:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 2707456. Throughput: 0: 723.4. Samples: 676912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:11:16,198][05156] Avg episode reward: [(0, '20.790')] +[2023-11-22 04:11:16,206][06878] Saving new best policy, reward=20.790! +[2023-11-22 04:11:21,197][05156] Fps is (10 sec: 2457.1, 60 sec: 3003.6, 300 sec: 3026.9). Total num frames: 2719744. Throughput: 0: 707.4. Samples: 680652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:11:21,202][05156] Avg episode reward: [(0, '20.220')] +[2023-11-22 04:11:21,214][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth... +[2023-11-22 04:11:21,375][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000486_1990656.pth +[2023-11-22 04:11:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 3013.0). Total num frames: 2732032. Throughput: 0: 706.4. Samples: 682496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:11:26,198][05156] Avg episode reward: [(0, '20.614')] +[2023-11-22 04:11:31,195][05156] Fps is (10 sec: 2048.4, 60 sec: 2662.4, 300 sec: 2971.3). Total num frames: 2740224. Throughput: 0: 689.3. Samples: 685494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:11:31,198][05156] Avg episode reward: [(0, '20.620')] +[2023-11-22 04:11:31,791][06891] Updated weights for policy 0, policy_version 670 (0.0052) +[2023-11-22 04:11:36,200][05156] Fps is (10 sec: 2047.1, 60 sec: 2662.2, 300 sec: 2957.4). Total num frames: 2752512. Throughput: 0: 660.5. Samples: 688948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:11:36,202][05156] Avg episode reward: [(0, '20.278')] +[2023-11-22 04:11:41,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2957.5). Total num frames: 2764800. Throughput: 0: 640.5. Samples: 690950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-11-22 04:11:41,202][05156] Avg episode reward: [(0, '20.879')] +[2023-11-22 04:11:41,216][06878] Saving new best policy, reward=20.879! +[2023-11-22 04:11:46,195][05156] Fps is (10 sec: 2868.5, 60 sec: 2730.7, 300 sec: 2971.3). Total num frames: 2781184. Throughput: 0: 614.4. Samples: 695140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:11:46,204][05156] Avg episode reward: [(0, '21.041')] +[2023-11-22 04:11:46,206][06878] Saving new best policy, reward=21.041! +[2023-11-22 04:11:46,441][06891] Updated weights for policy 0, policy_version 680 (0.0014) +[2023-11-22 04:11:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 2798.9, 300 sec: 2999.1). Total num frames: 2801664. Throughput: 0: 664.0. Samples: 701118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:11:51,200][05156] Avg episode reward: [(0, '21.091')] +[2023-11-22 04:11:51,215][06878] Saving new best policy, reward=21.091! +[2023-11-22 04:11:56,196][05156] Fps is (10 sec: 3686.2, 60 sec: 2730.6, 300 sec: 2985.2). Total num frames: 2818048. Throughput: 0: 685.4. Samples: 703894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:11:56,200][05156] Avg episode reward: [(0, '21.150')] +[2023-11-22 04:11:56,202][06878] Saving new best policy, reward=21.150! +[2023-11-22 04:11:58,873][06891] Updated weights for policy 0, policy_version 690 (0.0031) +[2023-11-22 04:12:01,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2662.5, 300 sec: 2971.3). Total num frames: 2830336. Throughput: 0: 681.8. Samples: 707594. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-11-22 04:12:01,200][05156] Avg episode reward: [(0, '21.972')] +[2023-11-22 04:12:01,214][06878] Saving new best policy, reward=21.972! +[2023-11-22 04:12:06,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2662.3, 300 sec: 2971.3). Total num frames: 2842624. Throughput: 0: 680.8. Samples: 711290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:12:06,200][05156] Avg episode reward: [(0, '22.124')] +[2023-11-22 04:12:06,203][06878] Saving new best policy, reward=22.124! +[2023-11-22 04:12:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2985.2). Total num frames: 2859008. Throughput: 0: 679.5. Samples: 713072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:12:11,198][05156] Avg episode reward: [(0, '21.421')] +[2023-11-22 04:12:13,174][06891] Updated weights for policy 0, policy_version 700 (0.0014) +[2023-11-22 04:12:16,195][05156] Fps is (10 sec: 3277.3, 60 sec: 2798.9, 300 sec: 2999.1). Total num frames: 2875392. Throughput: 0: 740.4. Samples: 718814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:12:16,198][05156] Avg episode reward: [(0, '20.765')] +[2023-11-22 04:12:21,195][05156] Fps is (10 sec: 3686.5, 60 sec: 2935.6, 300 sec: 2999.1). Total num frames: 2895872. Throughput: 0: 785.2. Samples: 724280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:12:21,205][05156] Avg episode reward: [(0, '20.099')] +[2023-11-22 04:12:25,932][06891] Updated weights for policy 0, policy_version 710 (0.0018) +[2023-11-22 04:12:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 2908160. Throughput: 0: 782.8. Samples: 726176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:12:26,203][05156] Avg episode reward: [(0, '20.335')] +[2023-11-22 04:12:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2920448. Throughput: 0: 776.4. Samples: 730076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:12:31,204][05156] Avg episode reward: [(0, '21.765')] +[2023-11-22 04:12:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3004.0, 300 sec: 2985.2). Total num frames: 2932736. Throughput: 0: 733.3. Samples: 734116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:12:36,197][05156] Avg episode reward: [(0, '21.044')] +[2023-11-22 04:12:39,397][06891] Updated weights for policy 0, policy_version 720 (0.0025) +[2023-11-22 04:12:41,195][05156] Fps is (10 sec: 3276.7, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2953216. Throughput: 0: 738.1. Samples: 737110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:12:41,203][05156] Avg episode reward: [(0, '21.669')] +[2023-11-22 04:12:46,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3013.0). Total num frames: 2973696. Throughput: 0: 788.0. Samples: 743056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:12:46,199][05156] Avg episode reward: [(0, '22.084')] +[2023-11-22 04:12:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 2985984. Throughput: 0: 792.8. Samples: 746966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:12:51,198][05156] Avg episode reward: [(0, '23.145')] +[2023-11-22 04:12:51,210][06878] Saving new best policy, reward=23.145! +[2023-11-22 04:12:52,408][06891] Updated weights for policy 0, policy_version 730 (0.0024) +[2023-11-22 04:12:56,196][05156] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2998272. Throughput: 0: 796.2. Samples: 748902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:12:56,203][05156] Avg episode reward: [(0, '23.776')] +[2023-11-22 04:12:56,206][06878] Saving new best policy, reward=23.776! +[2023-11-22 04:13:01,196][05156] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 3010560. Throughput: 0: 749.9. Samples: 752558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:13:01,198][05156] Avg episode reward: [(0, '24.874')] +[2023-11-22 04:13:01,213][06878] Saving new best policy, reward=24.874! +[2023-11-22 04:13:05,975][06891] Updated weights for policy 0, policy_version 740 (0.0024) +[2023-11-22 04:13:06,195][05156] Fps is (10 sec: 3276.9, 60 sec: 3140.4, 300 sec: 3013.0). Total num frames: 3031040. Throughput: 0: 750.0. Samples: 758030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:13:06,201][05156] Avg episode reward: [(0, '25.598')] +[2023-11-22 04:13:06,207][06878] Saving new best policy, reward=25.598! +[2023-11-22 04:13:11,195][05156] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 3047424. Throughput: 0: 773.1. Samples: 760964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:13:11,201][05156] Avg episode reward: [(0, '24.715')] +[2023-11-22 04:13:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 3059712. Throughput: 0: 786.4. Samples: 765466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:13:16,199][05156] Avg episode reward: [(0, '23.849')] +[2023-11-22 04:13:19,516][06891] Updated weights for policy 0, policy_version 750 (0.0014) +[2023-11-22 04:13:21,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2935.4, 300 sec: 2985.2). Total num frames: 3072000. Throughput: 0: 781.9. Samples: 769302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:13:21,202][05156] Avg episode reward: [(0, '23.683')] +[2023-11-22 04:13:21,258][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000751_3076096.pth... +[2023-11-22 04:13:21,393][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth +[2023-11-22 04:13:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2985.2). Total num frames: 3084288. Throughput: 0: 755.4. Samples: 771102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:13:26,198][05156] Avg episode reward: [(0, '22.674')] +[2023-11-22 04:13:31,195][05156] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3104768. Throughput: 0: 732.0. Samples: 775994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:13:31,198][05156] Avg episode reward: [(0, '21.411')] +[2023-11-22 04:13:32,461][06891] Updated weights for policy 0, policy_version 760 (0.0014) +[2023-11-22 04:13:36,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 3125248. Throughput: 0: 778.0. Samples: 781974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:13:36,198][05156] Avg episode reward: [(0, '20.716')] +[2023-11-22 04:13:41,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3137536. Throughput: 0: 783.4. Samples: 784156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:13:41,202][05156] Avg episode reward: [(0, '21.679')] +[2023-11-22 04:13:46,048][06891] Updated weights for policy 0, policy_version 770 (0.0038) +[2023-11-22 04:13:46,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 3153920. Throughput: 0: 787.9. Samples: 788014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:13:46,201][05156] Avg episode reward: [(0, '23.157')] +[2023-11-22 04:13:51,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 3166208. Throughput: 0: 754.6. Samples: 791988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:13:51,203][05156] Avg episode reward: [(0, '22.659')] +[2023-11-22 04:13:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3182592. Throughput: 0: 738.5. Samples: 794196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:13:56,198][05156] Avg episode reward: [(0, '22.170')] +[2023-11-22 04:13:58,685][06891] Updated weights for policy 0, policy_version 780 (0.0016) +[2023-11-22 04:14:01,196][05156] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 3203072. Throughput: 0: 772.9. Samples: 800246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:14:01,205][05156] Avg episode reward: [(0, '21.700')] +[2023-11-22 04:14:06,201][05156] Fps is (10 sec: 3684.4, 60 sec: 3140.0, 300 sec: 3012.9). Total num frames: 3219456. Throughput: 0: 801.6. Samples: 805380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:14:06,203][05156] Avg episode reward: [(0, '22.940')] +[2023-11-22 04:14:11,196][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3231744. Throughput: 0: 803.7. Samples: 807268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:14:11,198][05156] Avg episode reward: [(0, '22.366')] +[2023-11-22 04:14:12,249][06891] Updated weights for policy 0, policy_version 790 (0.0012) +[2023-11-22 04:14:16,195][05156] Fps is (10 sec: 2458.9, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3244032. Throughput: 0: 780.4. Samples: 811114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:14:16,202][05156] Avg episode reward: [(0, '22.147')] +[2023-11-22 04:14:21,195][05156] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 3260416. Throughput: 0: 750.6. Samples: 815750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:14:21,204][05156] Avg episode reward: [(0, '22.734')] +[2023-11-22 04:14:24,699][06891] Updated weights for policy 0, policy_version 800 (0.0029) +[2023-11-22 04:14:26,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 3280896. Throughput: 0: 769.9. Samples: 818800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:14:26,198][05156] Avg episode reward: [(0, '24.043')] +[2023-11-22 04:14:31,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 3297280. Throughput: 0: 811.6. Samples: 824534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:14:31,198][05156] Avg episode reward: [(0, '24.913')] +[2023-11-22 04:14:36,196][05156] Fps is (10 sec: 2867.0, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3309568. Throughput: 0: 810.5. Samples: 828460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:14:36,201][05156] Avg episode reward: [(0, '24.507')] +[2023-11-22 04:14:38,265][06891] Updated weights for policy 0, policy_version 810 (0.0019) +[2023-11-22 04:14:41,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3321856. Throughput: 0: 803.7. Samples: 830362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:14:41,203][05156] Avg episode reward: [(0, '24.406')] +[2023-11-22 04:14:46,195][05156] Fps is (10 sec: 2867.4, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3338240. Throughput: 0: 755.6. Samples: 834246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:14:46,198][05156] Avg episode reward: [(0, '23.702')] +[2023-11-22 04:14:50,763][06891] Updated weights for policy 0, policy_version 820 (0.0022) +[2023-11-22 04:14:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 3358720. Throughput: 0: 775.9. Samples: 840292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:14:51,198][05156] Avg episode reward: [(0, '23.647')] +[2023-11-22 04:14:56,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 3379200. Throughput: 0: 801.3. Samples: 843328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:14:56,202][05156] Avg episode reward: [(0, '25.418')] +[2023-11-22 04:15:01,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 3391488. Throughput: 0: 807.5. Samples: 847454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:15:01,198][05156] Avg episode reward: [(0, '25.987')] +[2023-11-22 04:15:01,209][06878] Saving new best policy, reward=25.987! +[2023-11-22 04:15:04,587][06891] Updated weights for policy 0, policy_version 830 (0.0025) +[2023-11-22 04:15:06,199][05156] Fps is (10 sec: 2047.2, 60 sec: 3003.8, 300 sec: 3013.0). Total num frames: 3399680. Throughput: 0: 790.7. Samples: 851336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:15:06,211][05156] Avg episode reward: [(0, '24.891')] +[2023-11-22 04:15:11,195][05156] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 3416064. Throughput: 0: 766.5. Samples: 853294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:15:11,205][05156] Avg episode reward: [(0, '24.704')] +[2023-11-22 04:15:16,195][05156] Fps is (10 sec: 3687.8, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 3436544. Throughput: 0: 760.8. Samples: 858772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:15:16,198][05156] Avg episode reward: [(0, '23.919')] +[2023-11-22 04:15:16,796][06891] Updated weights for policy 0, policy_version 840 (0.0013) +[2023-11-22 04:15:21,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 3457024. Throughput: 0: 806.3. Samples: 864742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:15:21,200][05156] Avg episode reward: [(0, '25.798')] +[2023-11-22 04:15:21,219][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth... +[2023-11-22 04:15:21,360][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth +[2023-11-22 04:15:26,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3140.2, 300 sec: 3013.0). Total num frames: 3469312. Throughput: 0: 805.4. Samples: 866606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:15:26,203][05156] Avg episode reward: [(0, '26.161')] +[2023-11-22 04:15:26,205][06878] Saving new best policy, reward=26.161! +[2023-11-22 04:15:30,685][06891] Updated weights for policy 0, policy_version 850 (0.0015) +[2023-11-22 04:15:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3481600. Throughput: 0: 803.3. Samples: 870394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:15:31,198][05156] Avg episode reward: [(0, '24.897')] +[2023-11-22 04:15:36,195][05156] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3493888. Throughput: 0: 753.9. Samples: 874216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:15:36,198][05156] Avg episode reward: [(0, '23.802')] +[2023-11-22 04:15:41,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 3514368. Throughput: 0: 751.1. Samples: 877126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:15:41,198][05156] Avg episode reward: [(0, '24.381')] +[2023-11-22 04:15:42,658][06891] Updated weights for policy 0, policy_version 860 (0.0024) +[2023-11-22 04:15:46,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3054.6). Total num frames: 3534848. Throughput: 0: 795.3. Samples: 883242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:15:46,201][05156] Avg episode reward: [(0, '24.265')] +[2023-11-22 04:15:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 3547136. Throughput: 0: 803.3. Samples: 887482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:15:51,204][05156] Avg episode reward: [(0, '25.170')] +[2023-11-22 04:15:56,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 3559424. Throughput: 0: 803.6. Samples: 889458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-11-22 04:15:56,200][05156] Avg episode reward: [(0, '24.861')] +[2023-11-22 04:15:56,818][06891] Updated weights for policy 0, policy_version 870 (0.0032) +[2023-11-22 04:16:01,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 3571712. Throughput: 0: 767.6. Samples: 893316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:16:01,204][05156] Avg episode reward: [(0, '24.435')] +[2023-11-22 04:16:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3208.7, 300 sec: 3040.8). Total num frames: 3592192. Throughput: 0: 751.1. Samples: 898542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:16:06,204][05156] Avg episode reward: [(0, '23.138')] +[2023-11-22 04:16:08,890][06891] Updated weights for policy 0, policy_version 880 (0.0034) +[2023-11-22 04:16:11,195][05156] Fps is (10 sec: 4096.1, 60 sec: 3276.8, 300 sec: 3068.5). Total num frames: 3612672. Throughput: 0: 777.8. Samples: 901608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:16:11,198][05156] Avg episode reward: [(0, '21.819')] +[2023-11-22 04:16:16,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 3624960. Throughput: 0: 799.1. Samples: 906352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:16:16,203][05156] Avg episode reward: [(0, '22.080')] +[2023-11-22 04:16:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3068.5). Total num frames: 3637248. Throughput: 0: 800.9. Samples: 910256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:16:21,206][05156] Avg episode reward: [(0, '19.197')] +[2023-11-22 04:16:23,279][06891] Updated weights for policy 0, policy_version 890 (0.0035) +[2023-11-22 04:16:26,196][05156] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 3082.4). Total num frames: 3649536. Throughput: 0: 778.3. Samples: 912152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:16:26,198][05156] Avg episode reward: [(0, '19.435')] +[2023-11-22 04:16:31,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3670016. Throughput: 0: 748.9. Samples: 916942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:16:31,204][05156] Avg episode reward: [(0, '19.907')] +[2023-11-22 04:16:34,819][06891] Updated weights for policy 0, policy_version 900 (0.0027) +[2023-11-22 04:16:36,195][05156] Fps is (10 sec: 4096.2, 60 sec: 3276.8, 300 sec: 3138.0). Total num frames: 3690496. Throughput: 0: 784.6. Samples: 922790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:16:36,198][05156] Avg episode reward: [(0, '21.841')] +[2023-11-22 04:16:41,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3702784. Throughput: 0: 795.3. Samples: 925248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:16:41,200][05156] Avg episode reward: [(0, '22.503')] +[2023-11-22 04:16:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 3715072. Throughput: 0: 796.1. Samples: 929142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:16:46,203][05156] Avg episode reward: [(0, '23.755')] +[2023-11-22 04:16:49,777][06891] Updated weights for policy 0, policy_version 910 (0.0013) +[2023-11-22 04:16:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3082.4). Total num frames: 3727360. Throughput: 0: 762.3. Samples: 932844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-11-22 04:16:51,205][05156] Avg episode reward: [(0, '22.778')] +[2023-11-22 04:16:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 3743744. Throughput: 0: 740.3. Samples: 934922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:16:56,204][05156] Avg episode reward: [(0, '23.357')] +[2023-11-22 04:17:01,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3764224. Throughput: 0: 770.3. Samples: 941014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:17:01,203][05156] Avg episode reward: [(0, '25.043')] +[2023-11-22 04:17:01,428][06891] Updated weights for policy 0, policy_version 920 (0.0028) +[2023-11-22 04:17:06,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3780608. Throughput: 0: 798.6. Samples: 946192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-11-22 04:17:06,198][05156] Avg episode reward: [(0, '24.528')] +[2023-11-22 04:17:11,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 3796992. Throughput: 0: 800.9. Samples: 948194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:11,198][05156] Avg episode reward: [(0, '23.349')] +[2023-11-22 04:17:16,014][06891] Updated weights for policy 0, policy_version 930 (0.0055) +[2023-11-22 04:17:16,197][05156] Fps is (10 sec: 2866.8, 60 sec: 3071.9, 300 sec: 3096.3). Total num frames: 3809280. Throughput: 0: 781.2. Samples: 952098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:17:16,205][05156] Avg episode reward: [(0, '23.855')] +[2023-11-22 04:17:21,196][05156] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3110.2). Total num frames: 3825664. Throughput: 0: 752.8. Samples: 956668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:17:21,201][05156] Avg episode reward: [(0, '23.433')] +[2023-11-22 04:17:21,212][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000934_3825664.pth... +[2023-11-22 04:17:21,328][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000751_3076096.pth +[2023-11-22 04:17:26,195][05156] Fps is (10 sec: 3686.9, 60 sec: 3276.8, 300 sec: 3138.0). Total num frames: 3846144. Throughput: 0: 764.5. Samples: 959650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:26,198][05156] Avg episode reward: [(0, '24.205')] +[2023-11-22 04:17:27,031][06891] Updated weights for policy 0, policy_version 940 (0.0023) +[2023-11-22 04:17:31,195][05156] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 3862528. Throughput: 0: 803.9. Samples: 965318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:31,201][05156] Avg episode reward: [(0, '24.389')] +[2023-11-22 04:17:36,197][05156] Fps is (10 sec: 2866.8, 60 sec: 3071.9, 300 sec: 3124.1). Total num frames: 3874816. Throughput: 0: 807.0. Samples: 969162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:36,200][05156] Avg episode reward: [(0, '23.201')] +[2023-11-22 04:17:41,203][05156] Fps is (10 sec: 2455.8, 60 sec: 3071.6, 300 sec: 3096.2). Total num frames: 3887104. Throughput: 0: 803.6. Samples: 971088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:41,209][05156] Avg episode reward: [(0, '23.697')] +[2023-11-22 04:17:42,255][06891] Updated weights for policy 0, policy_version 950 (0.0024) +[2023-11-22 04:17:46,195][05156] Fps is (10 sec: 2867.6, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3903488. Throughput: 0: 754.9. Samples: 974984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:46,205][05156] Avg episode reward: [(0, '23.941')] +[2023-11-22 04:17:51,195][05156] Fps is (10 sec: 3279.2, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3919872. Throughput: 0: 770.3. Samples: 980854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:17:51,206][05156] Avg episode reward: [(0, '24.826')] +[2023-11-22 04:17:53,453][06891] Updated weights for policy 0, policy_version 960 (0.0017) +[2023-11-22 04:17:56,195][05156] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3151.8). Total num frames: 3940352. Throughput: 0: 793.5. Samples: 983902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-11-22 04:17:56,199][05156] Avg episode reward: [(0, '24.592')] +[2023-11-22 04:18:01,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3952640. Throughput: 0: 801.1. Samples: 988146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-11-22 04:18:01,199][05156] Avg episode reward: [(0, '24.691')] +[2023-11-22 04:18:06,202][05156] Fps is (10 sec: 2456.0, 60 sec: 3071.7, 300 sec: 3110.1). Total num frames: 3964928. Throughput: 0: 784.9. Samples: 991994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-11-22 04:18:06,204][05156] Avg episode reward: [(0, '25.224')] +[2023-11-22 04:18:08,143][06891] Updated weights for policy 0, policy_version 970 (0.0022) +[2023-11-22 04:18:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 3981312. Throughput: 0: 762.2. Samples: 993950. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-11-22 04:18:11,200][05156] Avg episode reward: [(0, '24.930')] +[2023-11-22 04:18:16,195][05156] Fps is (10 sec: 3278.9, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 3997696. Throughput: 0: 756.6. Samples: 999366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-11-22 04:18:16,198][05156] Avg episode reward: [(0, '24.555')] +[2023-11-22 04:18:17,314][06878] Stopping Batcher_0... +[2023-11-22 04:18:17,314][05156] Component Batcher_0 stopped! +[2023-11-22 04:18:17,315][06878] Loop batcher_evt_loop terminating... +[2023-11-22 04:18:17,322][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-11-22 04:18:17,367][05156] Component RolloutWorker_w1 stopped! +[2023-11-22 04:18:17,380][06893] Stopping RolloutWorker_w1... +[2023-11-22 04:18:17,386][05156] Component RolloutWorker_w2 stopped! +[2023-11-22 04:18:17,394][05156] Component RolloutWorker_w0 stopped! +[2023-11-22 04:18:17,388][06894] Stopping RolloutWorker_w2... +[2023-11-22 04:18:17,382][06893] Loop rollout_proc1_evt_loop terminating... +[2023-11-22 04:18:17,396][06892] Stopping RolloutWorker_w0... +[2023-11-22 04:18:17,406][06897] Stopping RolloutWorker_w5... +[2023-11-22 04:18:17,406][05156] Component RolloutWorker_w5 stopped! +[2023-11-22 04:18:17,412][05156] Component RolloutWorker_w4 stopped! +[2023-11-22 04:18:17,414][06896] Stopping RolloutWorker_w4... +[2023-11-22 04:18:17,397][06894] Loop rollout_proc2_evt_loop terminating... +[2023-11-22 04:18:17,418][06891] Weights refcount: 2 0 +[2023-11-22 04:18:17,407][06897] Loop rollout_proc5_evt_loop terminating... +[2023-11-22 04:18:17,421][05156] Component InferenceWorker_p0-w0 stopped! +[2023-11-22 04:18:17,404][06892] Loop rollout_proc0_evt_loop terminating... +[2023-11-22 04:18:17,420][06891] Stopping InferenceWorker_p0-w0... +[2023-11-22 04:18:17,426][06891] Loop inference_proc0-0_evt_loop terminating... +[2023-11-22 04:18:17,426][05156] Component RolloutWorker_w6 stopped! +[2023-11-22 04:18:17,428][06898] Stopping RolloutWorker_w6... +[2023-11-22 04:18:17,433][06899] Stopping RolloutWorker_w7... +[2023-11-22 04:18:17,433][05156] Component RolloutWorker_w7 stopped! +[2023-11-22 04:18:17,423][06896] Loop rollout_proc4_evt_loop terminating... +[2023-11-22 04:18:17,433][06899] Loop rollout_proc7_evt_loop terminating... +[2023-11-22 04:18:17,429][06898] Loop rollout_proc6_evt_loop terminating... +[2023-11-22 04:18:17,469][05156] Component RolloutWorker_w3 stopped! +[2023-11-22 04:18:17,469][06895] Stopping RolloutWorker_w3... +[2023-11-22 04:18:17,476][06895] Loop rollout_proc3_evt_loop terminating... +[2023-11-22 04:18:17,491][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth +[2023-11-22 04:18:17,499][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-11-22 04:18:17,651][05156] Component LearnerWorker_p0 stopped! +[2023-11-22 04:18:17,651][06878] Stopping LearnerWorker_p0... +[2023-11-22 04:18:17,654][06878] Loop learner_proc0_evt_loop terminating... +[2023-11-22 04:18:17,654][05156] Waiting for process learner_proc0 to stop... +[2023-11-22 04:18:19,169][05156] Waiting for process inference_proc0-0 to join... +[2023-11-22 04:18:19,173][05156] Waiting for process rollout_proc0 to join... +[2023-11-22 04:18:20,972][05156] Waiting for process rollout_proc1 to join... +[2023-11-22 04:18:21,240][05156] Waiting for process rollout_proc2 to join... +[2023-11-22 04:18:21,247][05156] Waiting for process rollout_proc3 to join... +[2023-11-22 04:18:21,249][05156] Waiting for process rollout_proc4 to join... +[2023-11-22 04:18:21,252][05156] Waiting for process rollout_proc5 to join... +[2023-11-22 04:18:21,254][05156] Waiting for process rollout_proc6 to join... +[2023-11-22 04:18:21,255][05156] Waiting for process rollout_proc7 to join... +[2023-11-22 04:18:21,256][05156] Batcher 0 profile tree view: +batching: 28.1666, releasing_batches: 0.0388 +[2023-11-22 04:18:21,258][05156] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 637.9062 +update_model: 9.6179 + weight_update: 0.0025 +one_step: 0.0026 + handle_policy_step: 656.3959 + deserialize: 18.3586, stack: 3.5409, obs_to_device_normalize: 127.5471, forward: 363.0852, send_messages: 30.5910 + prepare_outputs: 81.3628 + to_cpu: 45.3745 +[2023-11-22 04:18:21,259][05156] Learner 0 profile tree view: +misc: 0.0057, prepare_batch: 13.6861 +train: 74.5323 + epoch_init: 0.0066, minibatch_init: 0.0068, losses_postprocess: 0.6013, kl_divergence: 0.6899, after_optimizer: 33.8195 + calculate_losses: 26.9243 + losses_init: 0.0036, forward_head: 1.5239, bptt_initial: 17.3557, tail: 1.3661, advantages_returns: 0.3833, losses: 3.6936 + bptt: 2.2215 + bptt_forward_core: 2.1011 + update: 11.8193 + clip: 0.9213 +[2023-11-22 04:18:21,261][05156] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4902, enqueue_policy_requests: 195.7657, env_step: 987.9004, overhead: 29.8512, complete_rollouts: 8.2389 +save_policy_outputs: 25.3051 + split_output_tensors: 12.0139 +[2023-11-22 04:18:21,263][05156] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3827, enqueue_policy_requests: 199.8480, env_step: 980.8307, overhead: 29.6204, complete_rollouts: 8.4801 +save_policy_outputs: 26.1206 + split_output_tensors: 12.0324 +[2023-11-22 04:18:21,265][05156] Loop Runner_EvtLoop terminating... +[2023-11-22 04:18:21,267][05156] Runner profile tree view: +main_loop: 1375.9910 +[2023-11-22 04:18:21,272][05156] Collected {0: 4005888}, FPS: 2911.3 +[2023-11-22 04:18:21,293][05156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-11-22 04:18:21,298][05156] Overriding arg 'num_workers' with value 1 passed from command line +[2023-11-22 04:18:21,299][05156] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-11-22 04:18:21,301][05156] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-11-22 04:18:21,302][05156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-11-22 04:18:21,304][05156] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-11-22 04:18:21,305][05156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-11-22 04:18:21,310][05156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-11-22 04:18:21,312][05156] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-11-22 04:18:21,313][05156] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-11-22 04:18:21,313][05156] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-11-22 04:18:21,317][05156] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-11-22 04:18:21,318][05156] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-11-22 04:18:21,319][05156] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-11-22 04:18:21,320][05156] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-11-22 04:18:21,376][05156] RunningMeanStd input shape: (3, 72, 128) +[2023-11-22 04:18:21,381][05156] RunningMeanStd input shape: (1,) +[2023-11-22 04:18:21,410][05156] ConvEncoder: input_channels=3 +[2023-11-22 04:18:21,485][05156] Conv encoder output size: 512 +[2023-11-22 04:18:21,487][05156] Policy head output size: 512 +[2023-11-22 04:18:21,517][05156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-11-22 04:18:22,193][05156] Num frames 100... +[2023-11-22 04:18:22,372][05156] Num frames 200... +[2023-11-22 04:18:22,559][05156] Num frames 300... +[2023-11-22 04:18:22,737][05156] Num frames 400... +[2023-11-22 04:18:22,924][05156] Num frames 500... +[2023-11-22 04:18:23,102][05156] Num frames 600... +[2023-11-22 04:18:23,287][05156] Num frames 700... +[2023-11-22 04:18:23,473][05156] Num frames 800... +[2023-11-22 04:18:23,661][05156] Num frames 900... +[2023-11-22 04:18:23,853][05156] Num frames 1000... +[2023-11-22 04:18:24,036][05156] Num frames 1100... +[2023-11-22 04:18:24,215][05156] Num frames 1200... +[2023-11-22 04:18:24,398][05156] Num frames 1300... +[2023-11-22 04:18:24,586][05156] Num frames 1400... +[2023-11-22 04:18:24,773][05156] Num frames 1500... +[2023-11-22 04:18:24,965][05156] Num frames 1600... +[2023-11-22 04:18:25,150][05156] Num frames 1700... +[2023-11-22 04:18:25,340][05156] Num frames 1800... +[2023-11-22 04:18:25,533][05156] Num frames 1900... +[2023-11-22 04:18:25,716][05156] Num frames 2000... +[2023-11-22 04:18:25,907][05156] Num frames 2100... +[2023-11-22 04:18:25,960][05156] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 +[2023-11-22 04:18:25,963][05156] Avg episode reward: 58.999, avg true_objective: 21.000 +[2023-11-22 04:18:26,153][05156] Num frames 2200... +[2023-11-22 04:18:26,315][05156] Num frames 2300... +[2023-11-22 04:18:26,449][05156] Num frames 2400... +[2023-11-22 04:18:26,574][05156] Num frames 2500... +[2023-11-22 04:18:26,691][05156] Avg episode rewards: #0: 33.739, true rewards: #0: 12.740 +[2023-11-22 04:18:26,693][05156] Avg episode reward: 33.739, avg true_objective: 12.740 +[2023-11-22 04:18:26,778][05156] Num frames 2600... +[2023-11-22 04:18:26,905][05156] Num frames 2700... +[2023-11-22 04:18:27,029][05156] Num frames 2800... +[2023-11-22 04:18:27,157][05156] Num frames 2900... +[2023-11-22 04:18:27,286][05156] Num frames 3000... +[2023-11-22 04:18:27,420][05156] Num frames 3100... +[2023-11-22 04:18:27,548][05156] Num frames 3200... +[2023-11-22 04:18:27,682][05156] Num frames 3300... +[2023-11-22 04:18:27,807][05156] Num frames 3400... +[2023-11-22 04:18:27,936][05156] Num frames 3500... +[2023-11-22 04:18:28,068][05156] Num frames 3600... +[2023-11-22 04:18:28,198][05156] Num frames 3700... +[2023-11-22 04:18:28,337][05156] Num frames 3800... +[2023-11-22 04:18:28,466][05156] Num frames 3900... +[2023-11-22 04:18:28,595][05156] Num frames 4000... +[2023-11-22 04:18:28,724][05156] Num frames 4100... +[2023-11-22 04:18:28,853][05156] Num frames 4200... +[2023-11-22 04:18:28,968][05156] Avg episode rewards: #0: 37.146, true rewards: #0: 14.147 +[2023-11-22 04:18:28,970][05156] Avg episode reward: 37.146, avg true_objective: 14.147 +[2023-11-22 04:18:29,059][05156] Num frames 4300... +[2023-11-22 04:18:29,186][05156] Num frames 4400... +[2023-11-22 04:18:29,344][05156] Num frames 4500... +[2023-11-22 04:18:29,476][05156] Num frames 4600... +[2023-11-22 04:18:29,602][05156] Num frames 4700... +[2023-11-22 04:18:29,729][05156] Num frames 4800... +[2023-11-22 04:18:29,860][05156] Num frames 4900... +[2023-11-22 04:18:29,987][05156] Num frames 5000... +[2023-11-22 04:18:30,117][05156] Num frames 5100... +[2023-11-22 04:18:30,242][05156] Num frames 5200... +[2023-11-22 04:18:30,375][05156] Num frames 5300... +[2023-11-22 04:18:30,502][05156] Num frames 5400... +[2023-11-22 04:18:30,634][05156] Num frames 5500... +[2023-11-22 04:18:30,770][05156] Num frames 5600... +[2023-11-22 04:18:30,911][05156] Num frames 5700... +[2023-11-22 04:18:31,045][05156] Num frames 5800... +[2023-11-22 04:18:31,170][05156] Num frames 5900... +[2023-11-22 04:18:31,295][05156] Num frames 6000... +[2023-11-22 04:18:31,434][05156] Num frames 6100... +[2023-11-22 04:18:31,565][05156] Num frames 6200... +[2023-11-22 04:18:31,698][05156] Num frames 6300... +[2023-11-22 04:18:31,809][05156] Avg episode rewards: #0: 42.359, true rewards: #0: 15.860 +[2023-11-22 04:18:31,811][05156] Avg episode reward: 42.359, avg true_objective: 15.860 +[2023-11-22 04:18:31,893][05156] Num frames 6400... +[2023-11-22 04:18:32,019][05156] Num frames 6500... +[2023-11-22 04:18:32,145][05156] Num frames 6600... +[2023-11-22 04:18:32,272][05156] Num frames 6700... +[2023-11-22 04:18:32,405][05156] Num frames 6800... +[2023-11-22 04:18:32,532][05156] Num frames 6900... +[2023-11-22 04:18:32,658][05156] Num frames 7000... +[2023-11-22 04:18:32,780][05156] Num frames 7100... +[2023-11-22 04:18:32,904][05156] Num frames 7200... +[2023-11-22 04:18:33,032][05156] Num frames 7300... +[2023-11-22 04:18:33,158][05156] Num frames 7400... +[2023-11-22 04:18:33,282][05156] Num frames 7500... +[2023-11-22 04:18:33,417][05156] Num frames 7600... +[2023-11-22 04:18:33,545][05156] Num frames 7700... +[2023-11-22 04:18:33,670][05156] Num frames 7800... +[2023-11-22 04:18:33,799][05156] Num frames 7900... +[2023-11-22 04:18:33,927][05156] Num frames 8000... +[2023-11-22 04:18:34,065][05156] Num frames 8100... +[2023-11-22 04:18:34,189][05156] Num frames 8200... +[2023-11-22 04:18:34,316][05156] Num frames 8300... +[2023-11-22 04:18:34,481][05156] Avg episode rewards: #0: 44.971, true rewards: #0: 16.772 +[2023-11-22 04:18:34,483][05156] Avg episode reward: 44.971, avg true_objective: 16.772 +[2023-11-22 04:18:34,506][05156] Num frames 8400... +[2023-11-22 04:18:34,631][05156] Num frames 8500... +[2023-11-22 04:18:34,756][05156] Num frames 8600... +[2023-11-22 04:18:34,887][05156] Num frames 8700... +[2023-11-22 04:18:35,013][05156] Num frames 8800... +[2023-11-22 04:18:35,140][05156] Num frames 8900... +[2023-11-22 04:18:35,277][05156] Num frames 9000... +[2023-11-22 04:18:35,449][05156] Avg episode rewards: #0: 39.816, true rewards: #0: 15.150 +[2023-11-22 04:18:35,451][05156] Avg episode reward: 39.816, avg true_objective: 15.150 +[2023-11-22 04:18:35,474][05156] Num frames 9100... +[2023-11-22 04:18:35,611][05156] Num frames 9200... +[2023-11-22 04:18:35,747][05156] Num frames 9300... +[2023-11-22 04:18:35,886][05156] Num frames 9400... +[2023-11-22 04:18:36,025][05156] Num frames 9500... +[2023-11-22 04:18:36,174][05156] Num frames 9600... +[2023-11-22 04:18:36,322][05156] Num frames 9700... +[2023-11-22 04:18:36,511][05156] Num frames 9800... +[2023-11-22 04:18:36,693][05156] Num frames 9900... +[2023-11-22 04:18:36,879][05156] Num frames 10000... +[2023-11-22 04:18:37,067][05156] Num frames 10100... +[2023-11-22 04:18:37,255][05156] Num frames 10200... +[2023-11-22 04:18:37,390][05156] Avg episode rewards: #0: 37.631, true rewards: #0: 14.631 +[2023-11-22 04:18:37,392][05156] Avg episode reward: 37.631, avg true_objective: 14.631 +[2023-11-22 04:18:37,514][05156] Num frames 10300... +[2023-11-22 04:18:37,697][05156] Num frames 10400... +[2023-11-22 04:18:37,878][05156] Num frames 10500... +[2023-11-22 04:18:38,059][05156] Num frames 10600... +[2023-11-22 04:18:38,245][05156] Num frames 10700... +[2023-11-22 04:18:38,423][05156] Num frames 10800... +[2023-11-22 04:18:38,624][05156] Num frames 10900... +[2023-11-22 04:18:38,808][05156] Num frames 11000... +[2023-11-22 04:18:38,996][05156] Num frames 11100... +[2023-11-22 04:18:39,190][05156] Num frames 11200... +[2023-11-22 04:18:39,371][05156] Num frames 11300... +[2023-11-22 04:18:39,487][05156] Avg episode rewards: #0: 36.162, true rewards: #0: 14.163 +[2023-11-22 04:18:39,490][05156] Avg episode reward: 36.162, avg true_objective: 14.163 +[2023-11-22 04:18:39,643][05156] Num frames 11400... +[2023-11-22 04:18:39,823][05156] Num frames 11500... +[2023-11-22 04:18:40,025][05156] Num frames 11600... +[2023-11-22 04:18:40,213][05156] Num frames 11700... +[2023-11-22 04:18:40,413][05156] Num frames 11800... +[2023-11-22 04:18:40,597][05156] Num frames 11900... +[2023-11-22 04:18:40,789][05156] Num frames 12000... +[2023-11-22 04:18:40,973][05156] Num frames 12100... +[2023-11-22 04:18:41,157][05156] Num frames 12200... +[2023-11-22 04:18:41,342][05156] Num frames 12300... +[2023-11-22 04:18:41,523][05156] Num frames 12400... +[2023-11-22 04:18:41,704][05156] Num frames 12500... +[2023-11-22 04:18:41,826][05156] Num frames 12600... +[2023-11-22 04:18:41,948][05156] Avg episode rewards: #0: 36.725, true rewards: #0: 14.059 +[2023-11-22 04:18:41,950][05156] Avg episode reward: 36.725, avg true_objective: 14.059 +[2023-11-22 04:18:42,015][05156] Num frames 12700... +[2023-11-22 04:18:42,147][05156] Num frames 12800... +[2023-11-22 04:18:42,279][05156] Num frames 12900... +[2023-11-22 04:18:42,404][05156] Num frames 13000... +[2023-11-22 04:18:42,539][05156] Num frames 13100... +[2023-11-22 04:18:42,644][05156] Avg episode rewards: #0: 34.037, true rewards: #0: 13.137 +[2023-11-22 04:18:42,646][05156] Avg episode reward: 34.037, avg true_objective: 13.137 +[2023-11-22 04:20:10,649][05156] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-11-22 04:20:15,985][05156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-11-22 04:20:15,989][05156] Overriding arg 'num_workers' with value 1 passed from command line +[2023-11-22 04:20:15,994][05156] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-11-22 04:20:15,995][05156] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-11-22 04:20:15,997][05156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-11-22 04:20:16,001][05156] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-11-22 04:20:16,002][05156] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-11-22 04:20:16,003][05156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-11-22 04:20:16,004][05156] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-11-22 04:20:16,007][05156] Adding new argument 'hf_repository'='tommylam/PPO-doomHealthGatheringSupreme' that is not in the saved config file! +[2023-11-22 04:20:16,015][05156] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-11-22 04:20:16,018][05156] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-11-22 04:20:16,019][05156] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-11-22 04:20:16,020][05156] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-11-22 04:20:16,021][05156] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-11-22 04:20:16,076][05156] RunningMeanStd input shape: (3, 72, 128) +[2023-11-22 04:20:16,079][05156] RunningMeanStd input shape: (1,) +[2023-11-22 04:20:16,097][05156] ConvEncoder: input_channels=3 +[2023-11-22 04:20:16,158][05156] Conv encoder output size: 512 +[2023-11-22 04:20:16,160][05156] Policy head output size: 512 +[2023-11-22 04:20:16,188][05156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-11-22 04:20:16,812][05156] Num frames 100... +[2023-11-22 04:20:16,991][05156] Num frames 200... +[2023-11-22 04:20:17,168][05156] Num frames 300... +[2023-11-22 04:20:17,353][05156] Num frames 400... +[2023-11-22 04:20:17,545][05156] Num frames 500... +[2023-11-22 04:20:17,729][05156] Num frames 600... +[2023-11-22 04:20:17,936][05156] Avg episode rewards: #0: 12.720, true rewards: #0: 6.720 +[2023-11-22 04:20:17,942][05156] Avg episode reward: 12.720, avg true_objective: 6.720 +[2023-11-22 04:20:18,004][05156] Num frames 700... +[2023-11-22 04:20:18,193][05156] Num frames 800... +[2023-11-22 04:20:18,377][05156] Num frames 900... +[2023-11-22 04:20:18,569][05156] Num frames 1000... +[2023-11-22 04:20:18,760][05156] Num frames 1100... +[2023-11-22 04:20:18,952][05156] Num frames 1200... +[2023-11-22 04:20:19,153][05156] Num frames 1300... +[2023-11-22 04:20:19,339][05156] Num frames 1400... +[2023-11-22 04:20:19,525][05156] Num frames 1500... +[2023-11-22 04:20:19,707][05156] Num frames 1600... +[2023-11-22 04:20:19,889][05156] Num frames 1700... +[2023-11-22 04:20:20,069][05156] Num frames 1800... +[2023-11-22 04:20:20,250][05156] Num frames 1900... +[2023-11-22 04:20:20,378][05156] Num frames 2000... +[2023-11-22 04:20:20,509][05156] Num frames 2100... +[2023-11-22 04:20:20,634][05156] Num frames 2200... +[2023-11-22 04:20:20,761][05156] Num frames 2300... +[2023-11-22 04:20:20,898][05156] Num frames 2400... +[2023-11-22 04:20:21,040][05156] Num frames 2500... +[2023-11-22 04:20:21,181][05156] Num frames 2600... +[2023-11-22 04:20:21,320][05156] Num frames 2700... +[2023-11-22 04:20:21,475][05156] Avg episode rewards: #0: 30.860, true rewards: #0: 13.860 +[2023-11-22 04:20:21,477][05156] Avg episode reward: 30.860, avg true_objective: 13.860 +[2023-11-22 04:20:21,521][05156] Num frames 2800... +[2023-11-22 04:20:21,659][05156] Num frames 2900... +[2023-11-22 04:20:21,796][05156] Num frames 3000... +[2023-11-22 04:20:21,928][05156] Num frames 3100... +[2023-11-22 04:20:22,066][05156] Num frames 3200... +[2023-11-22 04:20:22,191][05156] Num frames 3300... +[2023-11-22 04:20:22,319][05156] Num frames 3400... +[2023-11-22 04:20:22,451][05156] Num frames 3500... +[2023-11-22 04:20:22,585][05156] Num frames 3600... +[2023-11-22 04:20:22,710][05156] Num frames 3700... +[2023-11-22 04:20:22,835][05156] Num frames 3800... +[2023-11-22 04:20:22,967][05156] Avg episode rewards: #0: 28.866, true rewards: #0: 12.867 +[2023-11-22 04:20:22,969][05156] Avg episode reward: 28.866, avg true_objective: 12.867 +[2023-11-22 04:20:23,034][05156] Num frames 3900... +[2023-11-22 04:20:23,157][05156] Num frames 4000... +[2023-11-22 04:20:23,290][05156] Num frames 4100... +[2023-11-22 04:20:23,417][05156] Num frames 4200... +[2023-11-22 04:20:23,547][05156] Num frames 4300... +[2023-11-22 04:20:23,670][05156] Num frames 4400... +[2023-11-22 04:20:23,799][05156] Num frames 4500... +[2023-11-22 04:20:23,925][05156] Num frames 4600... +[2023-11-22 04:20:24,060][05156] Num frames 4700... +[2023-11-22 04:20:24,186][05156] Num frames 4800... +[2023-11-22 04:20:24,313][05156] Num frames 4900... +[2023-11-22 04:20:24,390][05156] Avg episode rewards: #0: 26.790, true rewards: #0: 12.290 +[2023-11-22 04:20:24,391][05156] Avg episode reward: 26.790, avg true_objective: 12.290 +[2023-11-22 04:20:24,506][05156] Num frames 5000... +[2023-11-22 04:20:24,634][05156] Num frames 5100... +[2023-11-22 04:20:24,759][05156] Num frames 5200... +[2023-11-22 04:20:24,886][05156] Num frames 5300... +[2023-11-22 04:20:25,012][05156] Num frames 5400... +[2023-11-22 04:20:25,146][05156] Num frames 5500... +[2023-11-22 04:20:25,271][05156] Num frames 5600... +[2023-11-22 04:20:25,401][05156] Num frames 5700... +[2023-11-22 04:20:25,528][05156] Num frames 5800... +[2023-11-22 04:20:25,660][05156] Num frames 5900... +[2023-11-22 04:20:25,785][05156] Num frames 6000... +[2023-11-22 04:20:25,913][05156] Num frames 6100... +[2023-11-22 04:20:26,051][05156] Num frames 6200... +[2023-11-22 04:20:26,186][05156] Num frames 6300... +[2023-11-22 04:20:26,303][05156] Avg episode rewards: #0: 28.688, true rewards: #0: 12.688 +[2023-11-22 04:20:26,305][05156] Avg episode reward: 28.688, avg true_objective: 12.688 +[2023-11-22 04:20:26,384][05156] Num frames 6400... +[2023-11-22 04:20:26,520][05156] Num frames 6500... +[2023-11-22 04:20:26,652][05156] Num frames 6600... +[2023-11-22 04:20:26,785][05156] Num frames 6700... +[2023-11-22 04:20:26,909][05156] Num frames 6800... +[2023-11-22 04:20:27,035][05156] Num frames 6900... +[2023-11-22 04:20:27,173][05156] Num frames 7000... +[2023-11-22 04:20:27,299][05156] Num frames 7100... +[2023-11-22 04:20:27,433][05156] Num frames 7200... +[2023-11-22 04:20:27,569][05156] Num frames 7300... +[2023-11-22 04:20:27,697][05156] Num frames 7400... +[2023-11-22 04:20:27,822][05156] Num frames 7500... +[2023-11-22 04:20:27,951][05156] Num frames 7600... +[2023-11-22 04:20:28,088][05156] Num frames 7700... +[2023-11-22 04:20:28,212][05156] Num frames 7800... +[2023-11-22 04:20:28,288][05156] Avg episode rewards: #0: 29.693, true rewards: #0: 13.027 +[2023-11-22 04:20:28,290][05156] Avg episode reward: 29.693, avg true_objective: 13.027 +[2023-11-22 04:20:28,399][05156] Num frames 7900... +[2023-11-22 04:20:28,529][05156] Num frames 8000... +[2023-11-22 04:20:28,654][05156] Num frames 8100... +[2023-11-22 04:20:28,779][05156] Num frames 8200... +[2023-11-22 04:20:28,907][05156] Num frames 8300... +[2023-11-22 04:20:29,031][05156] Num frames 8400... +[2023-11-22 04:20:29,164][05156] Num frames 8500... +[2023-11-22 04:20:29,297][05156] Num frames 8600... +[2023-11-22 04:20:29,388][05156] Avg episode rewards: #0: 27.754, true rewards: #0: 12.326 +[2023-11-22 04:20:29,390][05156] Avg episode reward: 27.754, avg true_objective: 12.326 +[2023-11-22 04:20:29,490][05156] Num frames 8700... +[2023-11-22 04:20:29,613][05156] Num frames 8800... +[2023-11-22 04:20:29,740][05156] Num frames 8900... +[2023-11-22 04:20:29,901][05156] Num frames 9000... +[2023-11-22 04:20:30,024][05156] Num frames 9100... +[2023-11-22 04:20:30,161][05156] Num frames 9200... +[2023-11-22 04:20:30,313][05156] Num frames 9300... +[2023-11-22 04:20:30,502][05156] Num frames 9400... +[2023-11-22 04:20:30,692][05156] Num frames 9500... +[2023-11-22 04:20:30,869][05156] Num frames 9600... +[2023-11-22 04:20:31,062][05156] Num frames 9700... +[2023-11-22 04:20:31,150][05156] Avg episode rewards: #0: 27.145, true rewards: #0: 12.145 +[2023-11-22 04:20:31,152][05156] Avg episode reward: 27.145, avg true_objective: 12.145 +[2023-11-22 04:20:31,314][05156] Num frames 9800... +[2023-11-22 04:20:31,495][05156] Num frames 9900... +[2023-11-22 04:20:31,680][05156] Num frames 10000... +[2023-11-22 04:20:31,868][05156] Num frames 10100... +[2023-11-22 04:20:32,053][05156] Num frames 10200... +[2023-11-22 04:20:32,251][05156] Num frames 10300... +[2023-11-22 04:20:32,438][05156] Num frames 10400... +[2023-11-22 04:20:32,636][05156] Num frames 10500... +[2023-11-22 04:20:32,817][05156] Num frames 10600... +[2023-11-22 04:20:32,996][05156] Num frames 10700... +[2023-11-22 04:20:33,183][05156] Num frames 10800... +[2023-11-22 04:20:33,369][05156] Num frames 10900... +[2023-11-22 04:20:33,561][05156] Num frames 11000... +[2023-11-22 04:20:33,748][05156] Num frames 11100... +[2023-11-22 04:20:33,933][05156] Num frames 11200... +[2023-11-22 04:20:34,122][05156] Num frames 11300... +[2023-11-22 04:20:34,311][05156] Num frames 11400... +[2023-11-22 04:20:34,516][05156] Avg episode rewards: #0: 29.639, true rewards: #0: 12.750 +[2023-11-22 04:20:34,518][05156] Avg episode reward: 29.639, avg true_objective: 12.750 +[2023-11-22 04:20:34,572][05156] Num frames 11500... +[2023-11-22 04:20:34,754][05156] Num frames 11600... +[2023-11-22 04:20:34,949][05156] Num frames 11700... +[2023-11-22 04:20:35,132][05156] Num frames 11800... +[2023-11-22 04:20:35,338][05156] Num frames 11900... +[2023-11-22 04:20:35,524][05156] Num frames 12000... +[2023-11-22 04:20:35,713][05156] Num frames 12100... +[2023-11-22 04:20:35,844][05156] Num frames 12200... +[2023-11-22 04:20:35,969][05156] Num frames 12300... +[2023-11-22 04:20:36,097][05156] Num frames 12400... +[2023-11-22 04:20:36,230][05156] Num frames 12500... +[2023-11-22 04:20:36,359][05156] Num frames 12600... +[2023-11-22 04:20:36,488][05156] Num frames 12700... +[2023-11-22 04:20:36,616][05156] Num frames 12800... +[2023-11-22 04:20:36,745][05156] Num frames 12900... +[2023-11-22 04:20:36,868][05156] Num frames 13000... +[2023-11-22 04:20:37,025][05156] Avg episode rewards: #0: 30.880, true rewards: #0: 13.080 +[2023-11-22 04:20:37,026][05156] Avg episode reward: 30.880, avg true_objective: 13.080 +[2023-11-22 04:22:06,299][05156] Replay video saved to /content/train_dir/default_experiment/replay.mp4!