diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1158 @@ +[2024-08-15 20:17:28,894][00707] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-08-15 20:17:28,897][00707] Rollout worker 0 uses device cpu +[2024-08-15 20:17:28,898][00707] Rollout worker 1 uses device cpu +[2024-08-15 20:17:28,900][00707] Rollout worker 2 uses device cpu +[2024-08-15 20:17:28,901][00707] Rollout worker 3 uses device cpu +[2024-08-15 20:17:28,903][00707] Rollout worker 4 uses device cpu +[2024-08-15 20:17:28,904][00707] Rollout worker 5 uses device cpu +[2024-08-15 20:17:28,905][00707] Rollout worker 6 uses device cpu +[2024-08-15 20:17:28,906][00707] Rollout worker 7 uses device cpu +[2024-08-15 20:17:29,072][00707] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-15 20:17:29,073][00707] InferenceWorker_p0-w0: min num requests: 2 +[2024-08-15 20:17:29,108][00707] Starting all processes... +[2024-08-15 20:17:29,110][00707] Starting process learner_proc0 +[2024-08-15 20:17:31,368][00707] Starting all processes... +[2024-08-15 20:17:31,387][00707] Starting process inference_proc0-0 +[2024-08-15 20:17:31,388][00707] Starting process rollout_proc0 +[2024-08-15 20:17:31,389][00707] Starting process rollout_proc1 +[2024-08-15 20:17:31,389][00707] Starting process rollout_proc2 +[2024-08-15 20:17:31,390][00707] Starting process rollout_proc3 +[2024-08-15 20:17:31,390][00707] Starting process rollout_proc4 +[2024-08-15 20:17:31,395][00707] Starting process rollout_proc5 +[2024-08-15 20:17:31,395][00707] Starting process rollout_proc6 +[2024-08-15 20:17:31,395][00707] Starting process rollout_proc7 +[2024-08-15 20:17:47,498][03485] Worker 1 uses CPU cores [1] +[2024-08-15 20:17:47,499][03490] Worker 6 uses CPU cores [0] +[2024-08-15 20:17:47,503][03470] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-15 20:17:47,506][03470] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-08-15 20:17:47,569][03486] Worker 2 uses CPU cores [0] +[2024-08-15 20:17:47,607][03470] Num visible devices: 1 +[2024-08-15 20:17:47,644][03470] Starting seed is not provided +[2024-08-15 20:17:47,645][03470] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-15 20:17:47,646][03470] Initializing actor-critic model on device cuda:0 +[2024-08-15 20:17:47,647][03470] RunningMeanStd input shape: (3, 72, 128) +[2024-08-15 20:17:47,651][03470] RunningMeanStd input shape: (1,) +[2024-08-15 20:17:47,708][03470] ConvEncoder: input_channels=3 +[2024-08-15 20:17:47,719][03484] Worker 0 uses CPU cores [0] +[2024-08-15 20:17:47,731][03487] Worker 3 uses CPU cores [1] +[2024-08-15 20:17:47,793][03483] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-15 20:17:47,794][03483] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-08-15 20:17:47,807][03489] Worker 4 uses CPU cores [0] +[2024-08-15 20:17:47,820][03483] Num visible devices: 1 +[2024-08-15 20:17:47,835][03491] Worker 7 uses CPU cores [1] +[2024-08-15 20:17:47,897][03488] Worker 5 uses CPU cores [1] +[2024-08-15 20:17:48,016][03470] Conv encoder output size: 512 +[2024-08-15 20:17:48,016][03470] Policy head output size: 512 +[2024-08-15 20:17:48,071][03470] Created Actor Critic model with architecture: +[2024-08-15 20:17:48,071][03470] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-08-15 20:17:48,341][03470] Using optimizer +[2024-08-15 20:17:49,064][00707] Heartbeat connected on Batcher_0 +[2024-08-15 20:17:49,074][00707] Heartbeat connected on InferenceWorker_p0-w0 +[2024-08-15 20:17:49,083][00707] Heartbeat connected on RolloutWorker_w0 +[2024-08-15 20:17:49,086][00707] Heartbeat connected on RolloutWorker_w1 +[2024-08-15 20:17:49,090][00707] Heartbeat connected on RolloutWorker_w2 +[2024-08-15 20:17:49,093][00707] Heartbeat connected on RolloutWorker_w3 +[2024-08-15 20:17:49,098][00707] Heartbeat connected on RolloutWorker_w4 +[2024-08-15 20:17:49,100][00707] Heartbeat connected on RolloutWorker_w5 +[2024-08-15 20:17:49,108][00707] Heartbeat connected on RolloutWorker_w7 +[2024-08-15 20:17:49,109][00707] Heartbeat connected on RolloutWorker_w6 +[2024-08-15 20:17:49,204][03470] No checkpoints found +[2024-08-15 20:17:49,205][03470] Did not load from checkpoint, starting from scratch! +[2024-08-15 20:17:49,205][03470] Initialized policy 0 weights for model version 0 +[2024-08-15 20:17:49,208][03470] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-15 20:17:49,215][03470] LearnerWorker_p0 finished initialization! +[2024-08-15 20:17:49,218][00707] Heartbeat connected on LearnerWorker_p0 +[2024-08-15 20:17:49,371][03483] RunningMeanStd input shape: (3, 72, 128) +[2024-08-15 20:17:49,372][03483] RunningMeanStd input shape: (1,) +[2024-08-15 20:17:49,386][03483] ConvEncoder: input_channels=3 +[2024-08-15 20:17:49,505][03483] Conv encoder output size: 512 +[2024-08-15 20:17:49,505][03483] Policy head output size: 512 +[2024-08-15 20:17:49,562][00707] Inference worker 0-0 is ready! +[2024-08-15 20:17:49,564][00707] All inference workers are ready! Signal rollout workers to start! +[2024-08-15 20:17:49,572][00707] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-15 20:17:49,881][03487] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:49,906][03489] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:49,979][03486] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:49,986][03488] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:49,990][03484] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:50,060][03490] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:50,105][03491] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:50,128][03485] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:17:51,745][03489] Decorrelating experience for 0 frames... +[2024-08-15 20:17:51,745][03487] Decorrelating experience for 0 frames... +[2024-08-15 20:17:51,746][03491] Decorrelating experience for 0 frames... +[2024-08-15 20:17:51,746][03490] Decorrelating experience for 0 frames... +[2024-08-15 20:17:51,744][03485] Decorrelating experience for 0 frames... +[2024-08-15 20:17:52,559][03490] Decorrelating experience for 32 frames... +[2024-08-15 20:17:52,557][03489] Decorrelating experience for 32 frames... +[2024-08-15 20:17:52,960][03487] Decorrelating experience for 32 frames... +[2024-08-15 20:17:52,962][03491] Decorrelating experience for 32 frames... +[2024-08-15 20:17:52,965][03485] Decorrelating experience for 32 frames... +[2024-08-15 20:17:54,150][03488] Decorrelating experience for 0 frames... +[2024-08-15 20:17:54,226][03490] Decorrelating experience for 64 frames... +[2024-08-15 20:17:54,229][03489] Decorrelating experience for 64 frames... +[2024-08-15 20:17:54,342][03487] Decorrelating experience for 64 frames... +[2024-08-15 20:17:54,462][03484] Decorrelating experience for 0 frames... +[2024-08-15 20:17:54,572][00707] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-15 20:17:55,191][03490] Decorrelating experience for 96 frames... +[2024-08-15 20:17:55,282][03488] Decorrelating experience for 32 frames... +[2024-08-15 20:17:55,375][03489] Decorrelating experience for 96 frames... +[2024-08-15 20:17:56,133][03491] Decorrelating experience for 64 frames... +[2024-08-15 20:17:56,430][03487] Decorrelating experience for 96 frames... +[2024-08-15 20:17:57,143][03486] Decorrelating experience for 0 frames... +[2024-08-15 20:17:59,091][03484] Decorrelating experience for 32 frames... +[2024-08-15 20:17:59,424][03488] Decorrelating experience for 64 frames... +[2024-08-15 20:17:59,572][00707] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 37.0. Samples: 370. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-15 20:17:59,579][00707] Avg episode reward: [(0, '1.352')] +[2024-08-15 20:18:00,634][03491] Decorrelating experience for 96 frames... +[2024-08-15 20:18:02,011][03486] Decorrelating experience for 32 frames... +[2024-08-15 20:18:04,572][00707] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 85.1. Samples: 1276. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-15 20:18:04,578][00707] Avg episode reward: [(0, '3.085')] +[2024-08-15 20:18:05,276][03488] Decorrelating experience for 96 frames... +[2024-08-15 20:18:07,900][03485] Decorrelating experience for 64 frames... +[2024-08-15 20:18:08,406][03470] Signal inference workers to stop experience collection... +[2024-08-15 20:18:08,449][03483] InferenceWorker_p0-w0: stopping experience collection +[2024-08-15 20:18:08,579][03484] Decorrelating experience for 64 frames... +[2024-08-15 20:18:09,255][03485] Decorrelating experience for 96 frames... +[2024-08-15 20:18:09,572][00707] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 113.0. Samples: 2260. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-15 20:18:09,574][00707] Avg episode reward: [(0, '3.344')] +[2024-08-15 20:18:09,643][03486] Decorrelating experience for 64 frames... +[2024-08-15 20:18:10,212][03484] Decorrelating experience for 96 frames... +[2024-08-15 20:18:10,409][03470] Signal inference workers to resume experience collection... +[2024-08-15 20:18:10,411][03483] InferenceWorker_p0-w0: resuming experience collection +[2024-08-15 20:18:10,484][03486] Decorrelating experience for 96 frames... +[2024-08-15 20:18:14,573][00707] Fps is (10 sec: 2047.9, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 20480. Throughput: 0: 206.8. Samples: 5170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:18:14,575][00707] Avg episode reward: [(0, '3.444')] +[2024-08-15 20:18:19,572][00707] Fps is (10 sec: 3686.4, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 36864. Throughput: 0: 275.7. Samples: 8272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:18:19,579][00707] Avg episode reward: [(0, '3.980')] +[2024-08-15 20:18:19,679][03483] Updated weights for policy 0, policy_version 10 (0.0050) +[2024-08-15 20:18:24,576][00707] Fps is (10 sec: 2866.4, 60 sec: 1404.2, 300 sec: 1404.2). Total num frames: 49152. Throughput: 0: 352.3. Samples: 12332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:18:24,580][00707] Avg episode reward: [(0, '4.165')] +[2024-08-15 20:18:29,572][00707] Fps is (10 sec: 2867.2, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 65536. Throughput: 0: 420.7. Samples: 16830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:18:29,578][00707] Avg episode reward: [(0, '4.432')] +[2024-08-15 20:18:32,671][03483] Updated weights for policy 0, policy_version 20 (0.0046) +[2024-08-15 20:18:34,572][00707] Fps is (10 sec: 3687.5, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 86016. Throughput: 0: 442.8. Samples: 19926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:18:34,582][00707] Avg episode reward: [(0, '4.486')] +[2024-08-15 20:18:39,572][00707] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 102400. Throughput: 0: 572.9. Samples: 25780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:18:39,577][00707] Avg episode reward: [(0, '4.458')] +[2024-08-15 20:18:39,654][03470] Saving new best policy, reward=4.458! +[2024-08-15 20:18:44,575][00707] Fps is (10 sec: 2866.6, 60 sec: 2085.1, 300 sec: 2085.1). Total num frames: 114688. Throughput: 0: 646.3. Samples: 29456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:18:44,578][00707] Avg episode reward: [(0, '4.541')] +[2024-08-15 20:18:44,589][03470] Saving new best policy, reward=4.541! +[2024-08-15 20:18:45,773][03483] Updated weights for policy 0, policy_version 30 (0.0020) +[2024-08-15 20:18:49,572][00707] Fps is (10 sec: 3276.8, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 135168. Throughput: 0: 683.5. Samples: 32034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:18:49,577][00707] Avg episode reward: [(0, '4.477')] +[2024-08-15 20:18:54,573][00707] Fps is (10 sec: 4096.7, 60 sec: 2594.1, 300 sec: 2394.6). Total num frames: 155648. Throughput: 0: 802.2. Samples: 38358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:18:54,580][00707] Avg episode reward: [(0, '4.543')] +[2024-08-15 20:18:54,592][03470] Saving new best policy, reward=4.543! +[2024-08-15 20:18:56,380][03483] Updated weights for policy 0, policy_version 40 (0.0023) +[2024-08-15 20:18:59,574][00707] Fps is (10 sec: 3276.4, 60 sec: 2798.9, 300 sec: 2399.0). Total num frames: 167936. Throughput: 0: 836.2. Samples: 42800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:18:59,577][00707] Avg episode reward: [(0, '4.523')] +[2024-08-15 20:19:04,572][00707] Fps is (10 sec: 2867.4, 60 sec: 3072.0, 300 sec: 2457.6). Total num frames: 184320. Throughput: 0: 809.6. Samples: 44704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:19:04,577][00707] Avg episode reward: [(0, '4.504')] +[2024-08-15 20:19:08,861][03483] Updated weights for policy 0, policy_version 50 (0.0020) +[2024-08-15 20:19:09,572][00707] Fps is (10 sec: 3686.8, 60 sec: 3413.3, 300 sec: 2560.0). Total num frames: 204800. Throughput: 0: 848.7. Samples: 50522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:19:09,579][00707] Avg episode reward: [(0, '4.408')] +[2024-08-15 20:19:14,580][00707] Fps is (10 sec: 4093.1, 60 sec: 3412.9, 300 sec: 2650.1). Total num frames: 225280. Throughput: 0: 881.6. Samples: 56510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:19:14,583][00707] Avg episode reward: [(0, '4.487')] +[2024-08-15 20:19:19,574][00707] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 2639.6). Total num frames: 237568. Throughput: 0: 853.5. Samples: 58334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:19:19,582][00707] Avg episode reward: [(0, '4.359')] +[2024-08-15 20:19:21,698][03483] Updated weights for policy 0, policy_version 60 (0.0026) +[2024-08-15 20:19:24,573][00707] Fps is (10 sec: 2869.2, 60 sec: 3413.5, 300 sec: 2673.2). Total num frames: 253952. Throughput: 0: 823.6. Samples: 62842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:19:24,575][00707] Avg episode reward: [(0, '4.450')] +[2024-08-15 20:19:24,590][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000062_253952.pth... +[2024-08-15 20:19:29,572][00707] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 2744.3). Total num frames: 274432. Throughput: 0: 878.0. Samples: 68966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:19:29,574][00707] Avg episode reward: [(0, '4.643')] +[2024-08-15 20:19:29,578][03470] Saving new best policy, reward=4.643! +[2024-08-15 20:19:32,329][03483] Updated weights for policy 0, policy_version 70 (0.0037) +[2024-08-15 20:19:34,578][00707] Fps is (10 sec: 3684.6, 60 sec: 3413.0, 300 sec: 2769.5). Total num frames: 290816. Throughput: 0: 924.1. Samples: 73624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:19:34,580][00707] Avg episode reward: [(0, '4.594')] +[2024-08-15 20:19:39,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2755.5). Total num frames: 303104. Throughput: 0: 825.6. Samples: 75508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:19:39,578][00707] Avg episode reward: [(0, '4.503')] +[2024-08-15 20:19:44,572][00707] Fps is (10 sec: 3278.5, 60 sec: 3481.7, 300 sec: 2813.8). Total num frames: 323584. Throughput: 0: 855.9. Samples: 81314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:19:44,580][00707] Avg episode reward: [(0, '4.271')] +[2024-08-15 20:19:44,762][03483] Updated weights for policy 0, policy_version 80 (0.0024) +[2024-08-15 20:19:49,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2867.2). Total num frames: 344064. Throughput: 0: 947.7. Samples: 87350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:19:49,574][00707] Avg episode reward: [(0, '4.218')] +[2024-08-15 20:19:54,577][00707] Fps is (10 sec: 3275.4, 60 sec: 3344.9, 300 sec: 2850.7). Total num frames: 356352. Throughput: 0: 860.6. Samples: 89254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:19:54,579][00707] Avg episode reward: [(0, '4.291')] +[2024-08-15 20:19:57,790][03483] Updated weights for policy 0, policy_version 90 (0.0025) +[2024-08-15 20:19:59,573][00707] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 2867.2). Total num frames: 372736. Throughput: 0: 825.6. Samples: 93656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:19:59,575][00707] Avg episode reward: [(0, '4.395')] +[2024-08-15 20:20:04,572][00707] Fps is (10 sec: 4097.7, 60 sec: 3549.9, 300 sec: 2943.0). Total num frames: 397312. Throughput: 0: 854.3. Samples: 96778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:20:04,575][00707] Avg episode reward: [(0, '4.570')] +[2024-08-15 20:20:07,602][03483] Updated weights for policy 0, policy_version 100 (0.0022) +[2024-08-15 20:20:09,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 2955.0). Total num frames: 413696. Throughput: 0: 891.3. Samples: 102952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:20:09,578][00707] Avg episode reward: [(0, '4.757')] +[2024-08-15 20:20:09,584][03470] Saving new best policy, reward=4.757! +[2024-08-15 20:20:14,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3345.5, 300 sec: 2937.8). Total num frames: 425984. Throughput: 0: 837.4. Samples: 106648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:20:14,576][00707] Avg episode reward: [(0, '4.545')] +[2024-08-15 20:20:19,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 2976.4). Total num frames: 446464. Throughput: 0: 857.9. Samples: 112226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:20:19,579][00707] Avg episode reward: [(0, '4.563')] +[2024-08-15 20:20:20,676][03483] Updated weights for policy 0, policy_version 110 (0.0035) +[2024-08-15 20:20:24,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3012.5). Total num frames: 466944. Throughput: 0: 884.6. Samples: 115314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:20:24,579][00707] Avg episode reward: [(0, '4.468')] +[2024-08-15 20:20:29,575][00707] Fps is (10 sec: 3276.1, 60 sec: 3413.2, 300 sec: 2995.2). Total num frames: 479232. Throughput: 0: 866.4. Samples: 120306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:20:29,578][00707] Avg episode reward: [(0, '4.627')] +[2024-08-15 20:20:34,499][03483] Updated weights for policy 0, policy_version 120 (0.0040) +[2024-08-15 20:20:34,572][00707] Fps is (10 sec: 2457.6, 60 sec: 3345.4, 300 sec: 2978.9). Total num frames: 491520. Throughput: 0: 771.1. Samples: 122050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:20:34,577][00707] Avg episode reward: [(0, '4.546')] +[2024-08-15 20:20:39,572][00707] Fps is (10 sec: 2458.1, 60 sec: 3345.1, 300 sec: 2963.6). Total num frames: 503808. Throughput: 0: 805.2. Samples: 125486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:20:39,576][00707] Avg episode reward: [(0, '4.441')] +[2024-08-15 20:20:44,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2972.5). Total num frames: 520192. Throughput: 0: 821.2. Samples: 130610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:20:44,577][00707] Avg episode reward: [(0, '4.258')] +[2024-08-15 20:20:48,961][03483] Updated weights for policy 0, policy_version 130 (0.0038) +[2024-08-15 20:20:49,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2958.2). Total num frames: 532480. Throughput: 0: 827.4. Samples: 134010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:20:49,578][00707] Avg episode reward: [(0, '4.398')] +[2024-08-15 20:20:54,573][00707] Fps is (10 sec: 2867.1, 60 sec: 3208.7, 300 sec: 2966.8). Total num frames: 548864. Throughput: 0: 737.6. Samples: 136144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:20:54,579][00707] Avg episode reward: [(0, '4.496')] +[2024-08-15 20:20:59,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 2975.0). Total num frames: 565248. Throughput: 0: 783.6. Samples: 141910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:20:59,577][00707] Avg episode reward: [(0, '4.724')] +[2024-08-15 20:21:00,618][03483] Updated weights for policy 0, policy_version 140 (0.0028) +[2024-08-15 20:21:04,574][00707] Fps is (10 sec: 3686.1, 60 sec: 3140.2, 300 sec: 3003.7). Total num frames: 585728. Throughput: 0: 729.5. Samples: 145056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:21:04,580][00707] Avg episode reward: [(0, '4.584')] +[2024-08-15 20:21:09,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2990.1). Total num frames: 598016. Throughput: 0: 750.6. Samples: 149090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:21:09,579][00707] Avg episode reward: [(0, '4.453')] +[2024-08-15 20:21:13,594][03483] Updated weights for policy 0, policy_version 150 (0.0040) +[2024-08-15 20:21:14,572][00707] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3017.1). Total num frames: 618496. Throughput: 0: 755.8. Samples: 154314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:21:14,576][00707] Avg episode reward: [(0, '4.485')] +[2024-08-15 20:21:19,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3042.7). Total num frames: 638976. Throughput: 0: 784.3. Samples: 157344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:21:19,580][00707] Avg episode reward: [(0, '4.626')] +[2024-08-15 20:21:24,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3029.1). Total num frames: 651264. Throughput: 0: 824.5. Samples: 162588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:21:24,575][00707] Avg episode reward: [(0, '4.636')] +[2024-08-15 20:21:24,584][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000159_651264.pth... +[2024-08-15 20:21:25,444][03483] Updated weights for policy 0, policy_version 160 (0.0022) +[2024-08-15 20:21:29,572][00707] Fps is (10 sec: 2457.6, 60 sec: 3072.1, 300 sec: 3016.1). Total num frames: 663552. Throughput: 0: 796.4. Samples: 166450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:21:29,578][00707] Avg episode reward: [(0, '4.652')] +[2024-08-15 20:21:34,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3058.3). Total num frames: 688128. Throughput: 0: 791.0. Samples: 169606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:21:34,577][00707] Avg episode reward: [(0, '4.568')] +[2024-08-15 20:21:36,558][03483] Updated weights for policy 0, policy_version 170 (0.0033) +[2024-08-15 20:21:39,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3063.1). Total num frames: 704512. Throughput: 0: 882.9. Samples: 175874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:21:39,577][00707] Avg episode reward: [(0, '4.558')] +[2024-08-15 20:21:44,575][00707] Fps is (10 sec: 2866.5, 60 sec: 3276.7, 300 sec: 3050.2). Total num frames: 716800. Throughput: 0: 843.9. Samples: 179886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:21:44,580][00707] Avg episode reward: [(0, '4.478')] +[2024-08-15 20:21:49,357][03483] Updated weights for policy 0, policy_version 180 (0.0036) +[2024-08-15 20:21:49,573][00707] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3072.0). Total num frames: 737280. Throughput: 0: 818.4. Samples: 181884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:21:49,576][00707] Avg episode reward: [(0, '4.537')] +[2024-08-15 20:21:54,572][00707] Fps is (10 sec: 3687.3, 60 sec: 3413.4, 300 sec: 3076.2). Total num frames: 753664. Throughput: 0: 865.6. Samples: 188042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:21:54,578][00707] Avg episode reward: [(0, '4.458')] +[2024-08-15 20:21:59,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3080.2). Total num frames: 770048. Throughput: 0: 868.4. Samples: 193390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:21:59,579][00707] Avg episode reward: [(0, '4.506')] +[2024-08-15 20:22:01,563][03483] Updated weights for policy 0, policy_version 190 (0.0030) +[2024-08-15 20:22:04,573][00707] Fps is (10 sec: 2867.1, 60 sec: 3276.9, 300 sec: 3068.0). Total num frames: 782336. Throughput: 0: 840.8. Samples: 195182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:22:04,578][00707] Avg episode reward: [(0, '4.494')] +[2024-08-15 20:22:09,575][00707] Fps is (10 sec: 3276.0, 60 sec: 3413.2, 300 sec: 3087.7). Total num frames: 802816. Throughput: 0: 839.0. Samples: 200344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:22:09,578][00707] Avg episode reward: [(0, '4.595')] +[2024-08-15 20:22:12,596][03483] Updated weights for policy 0, policy_version 200 (0.0021) +[2024-08-15 20:22:14,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3106.8). Total num frames: 823296. Throughput: 0: 889.9. Samples: 206494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:22:14,577][00707] Avg episode reward: [(0, '4.510')] +[2024-08-15 20:22:19,572][00707] Fps is (10 sec: 3687.3, 60 sec: 3345.1, 300 sec: 3109.9). Total num frames: 839680. Throughput: 0: 870.5. Samples: 208780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:22:19,577][00707] Avg episode reward: [(0, '4.449')] +[2024-08-15 20:22:24,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3113.0). Total num frames: 856064. Throughput: 0: 816.6. Samples: 212622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:22:24,575][00707] Avg episode reward: [(0, '4.501')] +[2024-08-15 20:22:25,465][03483] Updated weights for policy 0, policy_version 210 (0.0017) +[2024-08-15 20:22:29,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3115.9). Total num frames: 872448. Throughput: 0: 863.2. Samples: 218730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:22:29,578][00707] Avg episode reward: [(0, '4.337')] +[2024-08-15 20:22:34,575][00707] Fps is (10 sec: 3685.5, 60 sec: 3413.2, 300 sec: 3133.1). Total num frames: 892928. Throughput: 0: 938.7. Samples: 224128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:22:34,578][00707] Avg episode reward: [(0, '4.393')] +[2024-08-15 20:22:37,440][03483] Updated weights for policy 0, policy_version 220 (0.0029) +[2024-08-15 20:22:39,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3121.4). Total num frames: 905216. Throughput: 0: 843.0. Samples: 225976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:22:39,575][00707] Avg episode reward: [(0, '4.465')] +[2024-08-15 20:22:44,574][00707] Fps is (10 sec: 3277.2, 60 sec: 3481.7, 300 sec: 3137.9). Total num frames: 925696. Throughput: 0: 837.1. Samples: 231060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:22:44,578][00707] Avg episode reward: [(0, '4.370')] +[2024-08-15 20:22:48,346][03483] Updated weights for policy 0, policy_version 230 (0.0037) +[2024-08-15 20:22:49,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3207.4). Total num frames: 946176. Throughput: 0: 936.9. Samples: 237344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:22:49,579][00707] Avg episode reward: [(0, '4.342')] +[2024-08-15 20:22:54,574][00707] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 958464. Throughput: 0: 871.9. Samples: 239578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:22:54,578][00707] Avg episode reward: [(0, '4.490')] +[2024-08-15 20:22:59,575][00707] Fps is (10 sec: 2457.1, 60 sec: 3344.9, 300 sec: 3290.7). Total num frames: 970752. Throughput: 0: 820.4. Samples: 243412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:22:59,581][00707] Avg episode reward: [(0, '4.440')] +[2024-08-15 20:23:01,523][03483] Updated weights for policy 0, policy_version 240 (0.0027) +[2024-08-15 20:23:04,572][00707] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 995328. Throughput: 0: 908.6. Samples: 249666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:23:04,574][00707] Avg episode reward: [(0, '4.548')] +[2024-08-15 20:23:09,573][00707] Fps is (10 sec: 4096.4, 60 sec: 3481.7, 300 sec: 3360.1). Total num frames: 1011712. Throughput: 0: 883.5. Samples: 252380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:23:09,576][00707] Avg episode reward: [(0, '4.555')] +[2024-08-15 20:23:14,573][00707] Fps is (10 sec: 2457.5, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 1019904. Throughput: 0: 818.3. Samples: 255552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:23:14,575][00707] Avg episode reward: [(0, '4.629')] +[2024-08-15 20:23:15,473][03483] Updated weights for policy 0, policy_version 250 (0.0053) +[2024-08-15 20:23:19,572][00707] Fps is (10 sec: 2048.2, 60 sec: 3208.5, 300 sec: 3332.4). Total num frames: 1032192. Throughput: 0: 732.7. Samples: 257096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:23:19,581][00707] Avg episode reward: [(0, '4.514')] +[2024-08-15 20:23:24,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1052672. Throughput: 0: 805.9. Samples: 262242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:23:24,574][00707] Avg episode reward: [(0, '4.658')] +[2024-08-15 20:23:24,585][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000257_1052672.pth... +[2024-08-15 20:23:24,749][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000062_253952.pth +[2024-08-15 20:23:27,088][03483] Updated weights for policy 0, policy_version 260 (0.0029) +[2024-08-15 20:23:29,573][00707] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 1073152. Throughput: 0: 831.1. Samples: 268460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:23:29,577][00707] Avg episode reward: [(0, '4.713')] +[2024-08-15 20:23:34,577][00707] Fps is (10 sec: 3275.5, 60 sec: 3208.4, 300 sec: 3332.3). Total num frames: 1085440. Throughput: 0: 736.0. Samples: 270468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:23:34,579][00707] Avg episode reward: [(0, '4.665')] +[2024-08-15 20:23:39,573][00707] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1101824. Throughput: 0: 774.2. Samples: 274418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:23:39,577][00707] Avg episode reward: [(0, '4.685')] +[2024-08-15 20:23:39,978][03483] Updated weights for policy 0, policy_version 270 (0.0030) +[2024-08-15 20:23:44,573][00707] Fps is (10 sec: 3687.8, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1122304. Throughput: 0: 831.2. Samples: 280816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:23:44,575][00707] Avg episode reward: [(0, '4.661')] +[2024-08-15 20:23:49,572][00707] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3332.3). Total num frames: 1138688. Throughput: 0: 763.7. Samples: 284032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:23:49,581][00707] Avg episode reward: [(0, '4.699')] +[2024-08-15 20:23:51,168][03483] Updated weights for policy 0, policy_version 280 (0.0021) +[2024-08-15 20:23:54,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3276.9, 300 sec: 3346.2). Total num frames: 1155072. Throughput: 0: 795.1. Samples: 288158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:23:54,582][00707] Avg episode reward: [(0, '4.884')] +[2024-08-15 20:23:54,591][03470] Saving new best policy, reward=4.884! +[2024-08-15 20:23:59,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3346.2). Total num frames: 1171456. Throughput: 0: 841.6. Samples: 293422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:23:59,575][00707] Avg episode reward: [(0, '4.969')] +[2024-08-15 20:23:59,582][03470] Saving new best policy, reward=4.969! +[2024-08-15 20:24:02,850][03483] Updated weights for policy 0, policy_version 290 (0.0032) +[2024-08-15 20:24:04,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1191936. Throughput: 0: 871.6. Samples: 296318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:24:04,575][00707] Avg episode reward: [(0, '4.779')] +[2024-08-15 20:24:09,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3332.4). Total num frames: 1208320. Throughput: 0: 877.9. Samples: 301748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:24:09,575][00707] Avg episode reward: [(0, '4.794')] +[2024-08-15 20:24:14,573][00707] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.4). Total num frames: 1220608. Throughput: 0: 826.9. Samples: 305672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:24:14,580][00707] Avg episode reward: [(0, '4.971')] +[2024-08-15 20:24:14,593][03470] Saving new best policy, reward=4.971! +[2024-08-15 20:24:15,785][03483] Updated weights for policy 0, policy_version 300 (0.0032) +[2024-08-15 20:24:19,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 1241088. Throughput: 0: 850.2. Samples: 308724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:24:19,581][00707] Avg episode reward: [(0, '4.949')] +[2024-08-15 20:24:24,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 1261568. Throughput: 0: 899.7. Samples: 314906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:24:24,578][00707] Avg episode reward: [(0, '4.932')] +[2024-08-15 20:24:27,269][03483] Updated weights for policy 0, policy_version 310 (0.0018) +[2024-08-15 20:24:29,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.4). Total num frames: 1273856. Throughput: 0: 848.6. Samples: 319004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:24:29,579][00707] Avg episode reward: [(0, '5.074')] +[2024-08-15 20:24:29,584][03470] Saving new best policy, reward=5.074! +[2024-08-15 20:24:34,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3413.6, 300 sec: 3346.2). Total num frames: 1290240. Throughput: 0: 816.0. Samples: 320752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:24:34,580][00707] Avg episode reward: [(0, '5.418')] +[2024-08-15 20:24:34,588][03470] Saving new best policy, reward=5.418! +[2024-08-15 20:24:38,960][03483] Updated weights for policy 0, policy_version 320 (0.0021) +[2024-08-15 20:24:39,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 1310720. Throughput: 0: 864.0. Samples: 327038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:24:39,577][00707] Avg episode reward: [(0, '5.792')] +[2024-08-15 20:24:39,579][03470] Saving new best policy, reward=5.792! +[2024-08-15 20:24:44,573][00707] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 1327104. Throughput: 0: 864.8. Samples: 332336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:24:44,579][00707] Avg episode reward: [(0, '6.012')] +[2024-08-15 20:24:44,597][03470] Saving new best policy, reward=6.012! +[2024-08-15 20:24:49,584][00707] Fps is (10 sec: 2863.8, 60 sec: 3344.4, 300 sec: 3332.3). Total num frames: 1339392. Throughput: 0: 841.5. Samples: 334194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:24:49,587][00707] Avg episode reward: [(0, '6.265')] +[2024-08-15 20:24:49,591][03470] Saving new best policy, reward=6.265! +[2024-08-15 20:24:52,050][03483] Updated weights for policy 0, policy_version 330 (0.0042) +[2024-08-15 20:24:54,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1359872. Throughput: 0: 830.3. Samples: 339112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:24:54,575][00707] Avg episode reward: [(0, '6.225')] +[2024-08-15 20:24:59,572][00707] Fps is (10 sec: 4100.9, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 1380352. Throughput: 0: 885.4. Samples: 345514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:24:59,579][00707] Avg episode reward: [(0, '6.161')] +[2024-08-15 20:25:03,182][03483] Updated weights for policy 0, policy_version 340 (0.0034) +[2024-08-15 20:25:04,573][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 1392640. Throughput: 0: 866.8. Samples: 347732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:25:04,575][00707] Avg episode reward: [(0, '6.278')] +[2024-08-15 20:25:04,648][03470] Saving new best policy, reward=6.278! +[2024-08-15 20:25:09,578][00707] Fps is (10 sec: 2865.7, 60 sec: 3344.8, 300 sec: 3332.3). Total num frames: 1409024. Throughput: 0: 812.9. Samples: 351490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:25:09,580][00707] Avg episode reward: [(0, '6.390')] +[2024-08-15 20:25:09,586][03470] Saving new best policy, reward=6.390! +[2024-08-15 20:25:14,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 1429504. Throughput: 0: 853.3. Samples: 357402. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:25:14,575][00707] Avg episode reward: [(0, '6.219')] +[2024-08-15 20:25:15,310][03483] Updated weights for policy 0, policy_version 350 (0.0031) +[2024-08-15 20:25:19,573][00707] Fps is (10 sec: 4098.1, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 1449984. Throughput: 0: 939.2. Samples: 363018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:25:19,577][00707] Avg episode reward: [(0, '6.042')] +[2024-08-15 20:25:24,573][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.4). Total num frames: 1462272. Throughput: 0: 841.8. Samples: 364918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:25:24,577][00707] Avg episode reward: [(0, '6.298')] +[2024-08-15 20:25:24,585][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000357_1462272.pth... +[2024-08-15 20:25:24,754][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000159_651264.pth +[2024-08-15 20:25:28,195][03483] Updated weights for policy 0, policy_version 360 (0.0046) +[2024-08-15 20:25:29,573][00707] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1478656. Throughput: 0: 830.6. Samples: 369714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:25:29,581][00707] Avg episode reward: [(0, '6.718')] +[2024-08-15 20:25:29,585][03470] Saving new best policy, reward=6.718! +[2024-08-15 20:25:34,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 1499136. Throughput: 0: 853.6. Samples: 372596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:25:34,576][00707] Avg episode reward: [(0, '6.859')] +[2024-08-15 20:25:34,596][03470] Saving new best policy, reward=6.859! +[2024-08-15 20:25:39,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 1511424. Throughput: 0: 867.7. Samples: 378160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:25:39,577][00707] Avg episode reward: [(0, '7.048')] +[2024-08-15 20:25:39,580][03470] Saving new best policy, reward=7.048! +[2024-08-15 20:25:39,972][03483] Updated weights for policy 0, policy_version 370 (0.0028) +[2024-08-15 20:25:44,572][00707] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 1523712. Throughput: 0: 805.9. Samples: 381778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:25:44,580][00707] Avg episode reward: [(0, '7.455')] +[2024-08-15 20:25:44,593][03470] Saving new best policy, reward=7.455! +[2024-08-15 20:25:49,575][00707] Fps is (10 sec: 2456.8, 60 sec: 3277.3, 300 sec: 3346.2). Total num frames: 1536000. Throughput: 0: 793.2. Samples: 383428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:25:49,582][00707] Avg episode reward: [(0, '7.469')] +[2024-08-15 20:25:49,586][03470] Saving new best policy, reward=7.469! +[2024-08-15 20:25:54,425][03483] Updated weights for policy 0, policy_version 380 (0.0016) +[2024-08-15 20:25:54,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 1556480. Throughput: 0: 803.6. Samples: 387646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:25:54,579][00707] Avg episode reward: [(0, '7.741')] +[2024-08-15 20:25:54,589][03470] Saving new best policy, reward=7.741! +[2024-08-15 20:25:59,572][00707] Fps is (10 sec: 3277.8, 60 sec: 3140.3, 300 sec: 3332.4). Total num frames: 1568768. Throughput: 0: 778.8. Samples: 392448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:25:59,574][00707] Avg episode reward: [(0, '7.864')] +[2024-08-15 20:25:59,582][03470] Saving new best policy, reward=7.864! +[2024-08-15 20:26:04,582][00707] Fps is (10 sec: 2455.3, 60 sec: 3139.8, 300 sec: 3332.2). Total num frames: 1581056. Throughput: 0: 750.2. Samples: 396784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:26:04,584][00707] Avg episode reward: [(0, '7.604')] +[2024-08-15 20:26:07,660][03483] Updated weights for policy 0, policy_version 390 (0.0033) +[2024-08-15 20:26:09,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3277.1, 300 sec: 3346.2). Total num frames: 1605632. Throughput: 0: 777.3. Samples: 399896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:26:09,577][00707] Avg episode reward: [(0, '8.523')] +[2024-08-15 20:26:09,581][03470] Saving new best policy, reward=8.523! +[2024-08-15 20:26:14,572][00707] Fps is (10 sec: 4099.8, 60 sec: 3208.5, 300 sec: 3332.3). Total num frames: 1622016. Throughput: 0: 805.5. Samples: 405962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:26:14,580][00707] Avg episode reward: [(0, '9.021')] +[2024-08-15 20:26:14,594][03470] Saving new best policy, reward=9.021! +[2024-08-15 20:26:19,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3332.3). Total num frames: 1634304. Throughput: 0: 781.2. Samples: 407752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:26:19,577][00707] Avg episode reward: [(0, '9.272')] +[2024-08-15 20:26:19,582][03470] Saving new best policy, reward=9.272! +[2024-08-15 20:26:20,611][03483] Updated weights for policy 0, policy_version 400 (0.0053) +[2024-08-15 20:26:24,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3360.1). Total num frames: 1654784. Throughput: 0: 757.8. Samples: 412262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:26:24,574][00707] Avg episode reward: [(0, '8.416')] +[2024-08-15 20:26:29,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1675264. Throughput: 0: 818.6. Samples: 418616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:26:29,575][00707] Avg episode reward: [(0, '7.917')] +[2024-08-15 20:26:30,231][03483] Updated weights for policy 0, policy_version 410 (0.0024) +[2024-08-15 20:26:34,574][00707] Fps is (10 sec: 3685.9, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 1691648. Throughput: 0: 848.7. Samples: 421616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:26:34,580][00707] Avg episode reward: [(0, '8.718')] +[2024-08-15 20:26:39,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 1703936. Throughput: 0: 838.3. Samples: 425368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:26:39,576][00707] Avg episode reward: [(0, '9.273')] +[2024-08-15 20:26:39,582][03470] Saving new best policy, reward=9.273! +[2024-08-15 20:26:43,228][03483] Updated weights for policy 0, policy_version 420 (0.0032) +[2024-08-15 20:26:44,572][00707] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 1724416. Throughput: 0: 855.4. Samples: 430942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:26:44,581][00707] Avg episode reward: [(0, '10.406')] +[2024-08-15 20:26:44,596][03470] Saving new best policy, reward=10.406! +[2024-08-15 20:26:49,573][00707] Fps is (10 sec: 4095.9, 60 sec: 3481.8, 300 sec: 3360.1). Total num frames: 1744896. Throughput: 0: 892.5. Samples: 436938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:26:49,576][00707] Avg episode reward: [(0, '11.008')] +[2024-08-15 20:26:49,582][03470] Saving new best policy, reward=11.008! +[2024-08-15 20:26:54,573][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 1757184. Throughput: 0: 864.9. Samples: 438818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-15 20:26:54,577][00707] Avg episode reward: [(0, '11.400')] +[2024-08-15 20:26:54,596][03470] Saving new best policy, reward=11.400! +[2024-08-15 20:26:56,106][03483] Updated weights for policy 0, policy_version 430 (0.0024) +[2024-08-15 20:26:59,572][00707] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1773568. Throughput: 0: 824.7. Samples: 443074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:26:59,577][00707] Avg episode reward: [(0, '11.412')] +[2024-08-15 20:26:59,581][03470] Saving new best policy, reward=11.412! +[2024-08-15 20:27:04,573][00707] Fps is (10 sec: 3276.8, 60 sec: 3482.1, 300 sec: 3346.2). Total num frames: 1789952. Throughput: 0: 852.0. Samples: 446092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-15 20:27:04,575][00707] Avg episode reward: [(0, '11.131')] +[2024-08-15 20:27:06,583][03483] Updated weights for policy 0, policy_version 440 (0.0033) +[2024-08-15 20:27:09,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1810432. Throughput: 0: 887.7. Samples: 452210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:27:09,578][00707] Avg episode reward: [(0, '11.162')] +[2024-08-15 20:27:14,574][00707] Fps is (10 sec: 3276.4, 60 sec: 3345.0, 300 sec: 3332.3). Total num frames: 1822720. Throughput: 0: 832.1. Samples: 456060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:27:14,579][00707] Avg episode reward: [(0, '11.913')] +[2024-08-15 20:27:14,590][03470] Saving new best policy, reward=11.913! +[2024-08-15 20:27:19,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 1839104. Throughput: 0: 815.4. Samples: 458306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:27:19,578][00707] Avg episode reward: [(0, '12.609')] +[2024-08-15 20:27:19,655][03470] Saving new best policy, reward=12.609! +[2024-08-15 20:27:19,664][03483] Updated weights for policy 0, policy_version 450 (0.0029) +[2024-08-15 20:27:24,572][00707] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1859584. Throughput: 0: 869.1. Samples: 464476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:27:24,579][00707] Avg episode reward: [(0, '12.506')] +[2024-08-15 20:27:24,632][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000455_1863680.pth... +[2024-08-15 20:27:24,785][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000257_1052672.pth +[2024-08-15 20:27:29,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3332.4). Total num frames: 1875968. Throughput: 0: 854.9. Samples: 469412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:27:29,575][00707] Avg episode reward: [(0, '12.262')] +[2024-08-15 20:27:32,104][03483] Updated weights for policy 0, policy_version 460 (0.0018) +[2024-08-15 20:27:34,575][00707] Fps is (10 sec: 2866.4, 60 sec: 3276.7, 300 sec: 3332.3). Total num frames: 1888256. Throughput: 0: 811.2. Samples: 473442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:27:34,583][00707] Avg episode reward: [(0, '11.559')] +[2024-08-15 20:27:39,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3332.4). Total num frames: 1908736. Throughput: 0: 838.5. Samples: 476552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:27:39,581][00707] Avg episode reward: [(0, '11.909')] +[2024-08-15 20:27:42,696][03483] Updated weights for policy 0, policy_version 470 (0.0020) +[2024-08-15 20:27:44,575][00707] Fps is (10 sec: 4096.1, 60 sec: 3413.2, 300 sec: 3332.3). Total num frames: 1929216. Throughput: 0: 881.6. Samples: 482750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:27:44,584][00707] Avg episode reward: [(0, '12.014')] +[2024-08-15 20:27:49,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3332.4). Total num frames: 1941504. Throughput: 0: 860.5. Samples: 484816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:27:49,578][00707] Avg episode reward: [(0, '12.993')] +[2024-08-15 20:27:49,584][03470] Saving new best policy, reward=12.993! +[2024-08-15 20:27:54,572][00707] Fps is (10 sec: 2867.9, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 1957888. Throughput: 0: 812.7. Samples: 488780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:27:54,575][00707] Avg episode reward: [(0, '14.065')] +[2024-08-15 20:27:54,589][03470] Saving new best policy, reward=14.065! +[2024-08-15 20:27:55,986][03483] Updated weights for policy 0, policy_version 480 (0.0042) +[2024-08-15 20:27:59,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 1978368. Throughput: 0: 866.4. Samples: 495046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:27:59,575][00707] Avg episode reward: [(0, '14.568')] +[2024-08-15 20:27:59,579][03470] Saving new best policy, reward=14.568! +[2024-08-15 20:28:04,573][00707] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 1994752. Throughput: 0: 884.0. Samples: 498084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:28:04,575][00707] Avg episode reward: [(0, '14.671')] +[2024-08-15 20:28:04,586][03470] Saving new best policy, reward=14.671! +[2024-08-15 20:28:08,285][03483] Updated weights for policy 0, policy_version 490 (0.0056) +[2024-08-15 20:28:09,574][00707] Fps is (10 sec: 2866.6, 60 sec: 3276.7, 300 sec: 3346.2). Total num frames: 2007040. Throughput: 0: 834.1. Samples: 502012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:28:09,577][00707] Avg episode reward: [(0, '15.200')] +[2024-08-15 20:28:09,586][03470] Saving new best policy, reward=15.200! +[2024-08-15 20:28:14,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 2027520. Throughput: 0: 842.3. Samples: 507314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:28:14,581][00707] Avg episode reward: [(0, '14.946')] +[2024-08-15 20:28:18,996][03483] Updated weights for policy 0, policy_version 500 (0.0022) +[2024-08-15 20:28:19,572][00707] Fps is (10 sec: 4096.8, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 2048000. Throughput: 0: 821.7. Samples: 510416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:28:19,576][00707] Avg episode reward: [(0, '15.593')] +[2024-08-15 20:28:19,581][03470] Saving new best policy, reward=15.593! +[2024-08-15 20:28:24,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2056192. Throughput: 0: 841.2. Samples: 514406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:28:24,575][00707] Avg episode reward: [(0, '14.748')] +[2024-08-15 20:28:29,576][00707] Fps is (10 sec: 2047.2, 60 sec: 3208.3, 300 sec: 3332.3). Total num frames: 2068480. Throughput: 0: 769.3. Samples: 517368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:28:29,581][00707] Avg episode reward: [(0, '15.670')] +[2024-08-15 20:28:29,583][03470] Saving new best policy, reward=15.670! +[2024-08-15 20:28:34,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3332.3). Total num frames: 2084864. Throughput: 0: 769.1. Samples: 519424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:28:34,575][00707] Avg episode reward: [(0, '15.060')] +[2024-08-15 20:28:35,089][03483] Updated weights for policy 0, policy_version 510 (0.0031) +[2024-08-15 20:28:39,572][00707] Fps is (10 sec: 3687.8, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2105344. Throughput: 0: 817.1. Samples: 525548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-15 20:28:39,578][00707] Avg episode reward: [(0, '15.321')] +[2024-08-15 20:28:44,573][00707] Fps is (10 sec: 3686.3, 60 sec: 3208.7, 300 sec: 3332.3). Total num frames: 2121728. Throughput: 0: 795.5. Samples: 530842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:28:44,575][00707] Avg episode reward: [(0, '14.972')] +[2024-08-15 20:28:47,150][03483] Updated weights for policy 0, policy_version 520 (0.0053) +[2024-08-15 20:28:49,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 2134016. Throughput: 0: 769.2. Samples: 532700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:28:49,581][00707] Avg episode reward: [(0, '15.607')] +[2024-08-15 20:28:54,572][00707] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2154496. Throughput: 0: 797.2. Samples: 537884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:28:54,580][00707] Avg episode reward: [(0, '15.456')] +[2024-08-15 20:28:57,768][03483] Updated weights for policy 0, policy_version 530 (0.0042) +[2024-08-15 20:28:59,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2174976. Throughput: 0: 819.9. Samples: 544208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:28:59,581][00707] Avg episode reward: [(0, '14.628')] +[2024-08-15 20:29:04,575][00707] Fps is (10 sec: 3276.1, 60 sec: 3208.4, 300 sec: 3318.4). Total num frames: 2187264. Throughput: 0: 798.7. Samples: 546358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:29:04,582][00707] Avg episode reward: [(0, '14.884')] +[2024-08-15 20:29:09,573][00707] Fps is (10 sec: 2867.2, 60 sec: 3276.9, 300 sec: 3332.3). Total num frames: 2203648. Throughput: 0: 793.6. Samples: 550118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:29:09,579][00707] Avg episode reward: [(0, '14.218')] +[2024-08-15 20:29:10,827][03483] Updated weights for policy 0, policy_version 540 (0.0029) +[2024-08-15 20:29:14,572][00707] Fps is (10 sec: 3687.2, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2224128. Throughput: 0: 868.8. Samples: 556462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:29:14,575][00707] Avg episode reward: [(0, '15.283')] +[2024-08-15 20:29:19,574][00707] Fps is (10 sec: 4095.2, 60 sec: 3276.7, 300 sec: 3332.3). Total num frames: 2244608. Throughput: 0: 892.3. Samples: 559580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:29:19,580][00707] Avg episode reward: [(0, '15.783')] +[2024-08-15 20:29:19,582][03470] Saving new best policy, reward=15.783! +[2024-08-15 20:29:22,581][03483] Updated weights for policy 0, policy_version 550 (0.0025) +[2024-08-15 20:29:24,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 2256896. Throughput: 0: 847.7. Samples: 563696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:29:24,580][00707] Avg episode reward: [(0, '16.904')] +[2024-08-15 20:29:24,592][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000551_2256896.pth... +[2024-08-15 20:29:24,812][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000357_1462272.pth +[2024-08-15 20:29:24,829][03470] Saving new best policy, reward=16.904! +[2024-08-15 20:29:29,572][00707] Fps is (10 sec: 2867.8, 60 sec: 3413.5, 300 sec: 3332.3). Total num frames: 2273280. Throughput: 0: 838.8. Samples: 568588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:29:29,575][00707] Avg episode reward: [(0, '18.104')] +[2024-08-15 20:29:29,582][03470] Saving new best policy, reward=18.104! +[2024-08-15 20:29:33,814][03483] Updated weights for policy 0, policy_version 560 (0.0027) +[2024-08-15 20:29:34,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 2293760. Throughput: 0: 865.2. Samples: 571632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:29:34,579][00707] Avg episode reward: [(0, '19.193')] +[2024-08-15 20:29:34,592][03470] Saving new best policy, reward=19.193! +[2024-08-15 20:29:39,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 2310144. Throughput: 0: 866.1. Samples: 576860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:29:39,578][00707] Avg episode reward: [(0, '18.644')] +[2024-08-15 20:29:44,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.5). Total num frames: 2322432. Throughput: 0: 812.5. Samples: 580770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:29:44,580][00707] Avg episode reward: [(0, '18.048')] +[2024-08-15 20:29:47,009][03483] Updated weights for policy 0, policy_version 570 (0.0021) +[2024-08-15 20:29:49,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 2342912. Throughput: 0: 832.8. Samples: 583834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:29:49,580][00707] Avg episode reward: [(0, '17.643')] +[2024-08-15 20:29:54,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 2363392. Throughput: 0: 891.3. Samples: 590226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:29:54,575][00707] Avg episode reward: [(0, '17.789')] +[2024-08-15 20:29:58,425][03483] Updated weights for policy 0, policy_version 580 (0.0033) +[2024-08-15 20:29:59,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 2375680. Throughput: 0: 844.4. Samples: 594460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:29:59,575][00707] Avg episode reward: [(0, '17.061')] +[2024-08-15 20:30:04,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3346.3). Total num frames: 2396160. Throughput: 0: 817.4. Samples: 596360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:30:04,577][00707] Avg episode reward: [(0, '17.131')] +[2024-08-15 20:30:09,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 2412544. Throughput: 0: 862.1. Samples: 602492. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:30:09,578][00707] Avg episode reward: [(0, '17.474')] +[2024-08-15 20:30:09,732][03483] Updated weights for policy 0, policy_version 590 (0.0020) +[2024-08-15 20:30:14,573][00707] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 2428928. Throughput: 0: 874.6. Samples: 607944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:30:14,578][00707] Avg episode reward: [(0, '17.316')] +[2024-08-15 20:30:19,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3276.9, 300 sec: 3318.5). Total num frames: 2441216. Throughput: 0: 849.0. Samples: 609838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:30:19,575][00707] Avg episode reward: [(0, '17.474')] +[2024-08-15 20:30:22,722][03483] Updated weights for policy 0, policy_version 600 (0.0031) +[2024-08-15 20:30:24,573][00707] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 2465792. Throughput: 0: 844.8. Samples: 614878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:30:24,578][00707] Avg episode reward: [(0, '16.208')] +[2024-08-15 20:30:29,572][00707] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3346.2). Total num frames: 2486272. Throughput: 0: 899.8. Samples: 621260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:30:29,574][00707] Avg episode reward: [(0, '15.416')] +[2024-08-15 20:30:34,108][03483] Updated weights for policy 0, policy_version 610 (0.0039) +[2024-08-15 20:30:34,575][00707] Fps is (10 sec: 3276.1, 60 sec: 3413.2, 300 sec: 3346.2). Total num frames: 2498560. Throughput: 0: 883.6. Samples: 623598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:30:34,578][00707] Avg episode reward: [(0, '17.296')] +[2024-08-15 20:30:39,573][00707] Fps is (10 sec: 2457.5, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 2510848. Throughput: 0: 823.5. Samples: 627284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:30:39,579][00707] Avg episode reward: [(0, '17.980')] +[2024-08-15 20:30:44,572][00707] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 2535424. Throughput: 0: 867.7. Samples: 633508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:30:44,577][00707] Avg episode reward: [(0, '18.473')] +[2024-08-15 20:30:45,296][03483] Updated weights for policy 0, policy_version 620 (0.0017) +[2024-08-15 20:30:49,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 2551808. Throughput: 0: 896.4. Samples: 636698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:30:49,577][00707] Avg episode reward: [(0, '18.242')] +[2024-08-15 20:30:54,576][00707] Fps is (10 sec: 2866.3, 60 sec: 3344.9, 300 sec: 3374.0). Total num frames: 2564096. Throughput: 0: 856.3. Samples: 641028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-08-15 20:30:54,578][00707] Avg episode reward: [(0, '18.804')] +[2024-08-15 20:30:59,572][00707] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3374.1). Total num frames: 2576384. Throughput: 0: 808.0. Samples: 644302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:30:59,575][00707] Avg episode reward: [(0, '20.067')] +[2024-08-15 20:30:59,579][03470] Saving new best policy, reward=20.067! +[2024-08-15 20:31:00,673][03483] Updated weights for policy 0, policy_version 630 (0.0040) +[2024-08-15 20:31:04,572][00707] Fps is (10 sec: 2868.2, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 2592768. Throughput: 0: 808.2. Samples: 646208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:31:04,577][00707] Avg episode reward: [(0, '18.989')] +[2024-08-15 20:31:09,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 2609152. Throughput: 0: 830.1. Samples: 652232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:31:09,578][00707] Avg episode reward: [(0, '17.681')] +[2024-08-15 20:31:13,290][03483] Updated weights for policy 0, policy_version 640 (0.0028) +[2024-08-15 20:31:14,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 2621440. Throughput: 0: 774.2. Samples: 656100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:31:14,580][00707] Avg episode reward: [(0, '17.127')] +[2024-08-15 20:31:19,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 2641920. Throughput: 0: 775.2. Samples: 658482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:31:19,580][00707] Avg episode reward: [(0, '16.925')] +[2024-08-15 20:31:23,885][03483] Updated weights for policy 0, policy_version 650 (0.0017) +[2024-08-15 20:31:24,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 2662400. Throughput: 0: 834.5. Samples: 664838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:31:24,574][00707] Avg episode reward: [(0, '17.111')] +[2024-08-15 20:31:24,587][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000650_2662400.pth... +[2024-08-15 20:31:24,722][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000455_1863680.pth +[2024-08-15 20:31:29,574][00707] Fps is (10 sec: 3685.7, 60 sec: 3208.4, 300 sec: 3346.2). Total num frames: 2678784. Throughput: 0: 805.8. Samples: 669772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:31:29,577][00707] Avg episode reward: [(0, '17.059')] +[2024-08-15 20:31:34,574][00707] Fps is (10 sec: 2866.8, 60 sec: 3208.6, 300 sec: 3346.2). Total num frames: 2691072. Throughput: 0: 827.2. Samples: 673922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:31:34,576][00707] Avg episode reward: [(0, '16.973')] +[2024-08-15 20:31:37,236][03483] Updated weights for policy 0, policy_version 660 (0.0063) +[2024-08-15 20:31:39,572][00707] Fps is (10 sec: 3277.5, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 2711552. Throughput: 0: 796.6. Samples: 676874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:31:39,579][00707] Avg episode reward: [(0, '17.908')] +[2024-08-15 20:31:44,572][00707] Fps is (10 sec: 4096.6, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 2732032. Throughput: 0: 862.2. Samples: 683100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:31:44,575][00707] Avg episode reward: [(0, '17.790')] +[2024-08-15 20:31:49,185][03483] Updated weights for policy 0, policy_version 670 (0.0021) +[2024-08-15 20:31:49,576][00707] Fps is (10 sec: 3275.6, 60 sec: 3208.3, 300 sec: 3346.2). Total num frames: 2744320. Throughput: 0: 905.7. Samples: 686968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:31:49,582][00707] Avg episode reward: [(0, '17.054')] +[2024-08-15 20:31:54,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3346.2). Total num frames: 2760704. Throughput: 0: 824.4. Samples: 689330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:31:54,580][00707] Avg episode reward: [(0, '17.478')] +[2024-08-15 20:31:59,572][00707] Fps is (10 sec: 3687.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 2781184. Throughput: 0: 880.3. Samples: 695712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:31:59,581][00707] Avg episode reward: [(0, '18.644')] +[2024-08-15 20:31:59,765][03483] Updated weights for policy 0, policy_version 680 (0.0023) +[2024-08-15 20:32:04,574][00707] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3346.2). Total num frames: 2797568. Throughput: 0: 896.8. Samples: 698840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:32:04,586][00707] Avg episode reward: [(0, '16.132')] +[2024-08-15 20:32:09,577][00707] Fps is (10 sec: 2866.0, 60 sec: 3344.8, 300 sec: 3346.2). Total num frames: 2809856. Throughput: 0: 837.3. Samples: 702520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:32:09,584][00707] Avg episode reward: [(0, '16.334')] +[2024-08-15 20:32:12,856][03483] Updated weights for policy 0, policy_version 690 (0.0016) +[2024-08-15 20:32:14,572][00707] Fps is (10 sec: 3277.3, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 2830336. Throughput: 0: 847.3. Samples: 707900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:32:14,580][00707] Avg episode reward: [(0, '17.389')] +[2024-08-15 20:32:19,573][00707] Fps is (10 sec: 4097.5, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 2850816. Throughput: 0: 895.8. Samples: 714230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:32:19,576][00707] Avg episode reward: [(0, '18.772')] +[2024-08-15 20:32:24,470][03483] Updated weights for policy 0, policy_version 700 (0.0027) +[2024-08-15 20:32:24,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 2867200. Throughput: 0: 876.0. Samples: 716294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:32:24,580][00707] Avg episode reward: [(0, '18.552')] +[2024-08-15 20:32:29,572][00707] Fps is (10 sec: 3277.0, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 2883584. Throughput: 0: 831.4. Samples: 720514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-15 20:32:29,580][00707] Avg episode reward: [(0, '18.812')] +[2024-08-15 20:32:34,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 2904064. Throughput: 0: 813.8. Samples: 723588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:32:34,575][00707] Avg episode reward: [(0, '19.151')] +[2024-08-15 20:32:35,358][03483] Updated weights for policy 0, policy_version 710 (0.0046) +[2024-08-15 20:32:39,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 2920448. Throughput: 0: 901.9. Samples: 729916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:32:39,577][00707] Avg episode reward: [(0, '20.023')] +[2024-08-15 20:32:44,577][00707] Fps is (10 sec: 2865.8, 60 sec: 3344.8, 300 sec: 3360.0). Total num frames: 2932736. Throughput: 0: 843.9. Samples: 733690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:32:44,580][00707] Avg episode reward: [(0, '20.147')] +[2024-08-15 20:32:44,593][03470] Saving new best policy, reward=20.147! +[2024-08-15 20:32:48,442][03483] Updated weights for policy 0, policy_version 720 (0.0027) +[2024-08-15 20:32:49,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3374.0). Total num frames: 2953216. Throughput: 0: 822.1. Samples: 735832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:32:49,576][00707] Avg episode reward: [(0, '20.757')] +[2024-08-15 20:32:49,586][03470] Saving new best policy, reward=20.757! +[2024-08-15 20:32:54,573][00707] Fps is (10 sec: 4098.0, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 2973696. Throughput: 0: 878.8. Samples: 742062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:32:54,580][00707] Avg episode reward: [(0, '21.289')] +[2024-08-15 20:32:54,590][03470] Saving new best policy, reward=21.289! +[2024-08-15 20:32:59,572][00707] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 2985984. Throughput: 0: 871.9. Samples: 747136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:32:59,581][00707] Avg episode reward: [(0, '21.804')] +[2024-08-15 20:32:59,590][03470] Saving new best policy, reward=21.804! +[2024-08-15 20:33:00,123][03483] Updated weights for policy 0, policy_version 730 (0.0026) +[2024-08-15 20:33:04,573][00707] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 3002368. Throughput: 0: 772.8. Samples: 749004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:33:04,577][00707] Avg episode reward: [(0, '20.731')] +[2024-08-15 20:33:09,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3374.0). Total num frames: 3022848. Throughput: 0: 846.7. Samples: 754394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:33:09,575][00707] Avg episode reward: [(0, '20.477')] +[2024-08-15 20:33:11,545][03483] Updated weights for policy 0, policy_version 740 (0.0023) +[2024-08-15 20:33:14,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 3039232. Throughput: 0: 891.3. Samples: 760622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:33:14,575][00707] Avg episode reward: [(0, '19.290')] +[2024-08-15 20:33:19,580][00707] Fps is (10 sec: 2864.9, 60 sec: 3344.6, 300 sec: 3373.9). Total num frames: 3051520. Throughput: 0: 867.4. Samples: 762628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:33:19,585][00707] Avg episode reward: [(0, '19.516')] +[2024-08-15 20:33:24,467][03483] Updated weights for policy 0, policy_version 750 (0.0030) +[2024-08-15 20:33:24,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 3072000. Throughput: 0: 819.6. Samples: 766796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:33:24,577][00707] Avg episode reward: [(0, '19.608')] +[2024-08-15 20:33:24,593][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000750_3072000.pth... +[2024-08-15 20:33:24,770][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000551_2256896.pth +[2024-08-15 20:33:29,578][00707] Fps is (10 sec: 3277.7, 60 sec: 3344.8, 300 sec: 3387.8). Total num frames: 3084288. Throughput: 0: 833.7. Samples: 771208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:33:29,580][00707] Avg episode reward: [(0, '20.030')] +[2024-08-15 20:33:34,572][00707] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3360.1). Total num frames: 3096576. Throughput: 0: 829.2. Samples: 773144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:33:34,577][00707] Avg episode reward: [(0, '20.607')] +[2024-08-15 20:33:39,573][00707] Fps is (10 sec: 2458.8, 60 sec: 3140.2, 300 sec: 3346.2). Total num frames: 3108864. Throughput: 0: 772.3. Samples: 776816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:33:39,575][00707] Avg episode reward: [(0, '20.346')] +[2024-08-15 20:33:40,393][03483] Updated weights for policy 0, policy_version 760 (0.0034) +[2024-08-15 20:33:44,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3277.1, 300 sec: 3374.0). Total num frames: 3129344. Throughput: 0: 777.1. Samples: 782106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-15 20:33:44,582][00707] Avg episode reward: [(0, '20.687')] +[2024-08-15 20:33:49,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 3149824. Throughput: 0: 804.8. Samples: 785218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:33:49,575][00707] Avg episode reward: [(0, '19.375')] +[2024-08-15 20:33:50,586][03483] Updated weights for policy 0, policy_version 770 (0.0022) +[2024-08-15 20:33:54,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3346.2). Total num frames: 3162112. Throughput: 0: 799.6. Samples: 790376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:33:54,578][00707] Avg episode reward: [(0, '19.537')] +[2024-08-15 20:33:59,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3360.1). Total num frames: 3178496. Throughput: 0: 756.3. Samples: 794654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:33:59,579][00707] Avg episode reward: [(0, '20.131')] +[2024-08-15 20:34:02,947][03483] Updated weights for policy 0, policy_version 780 (0.0017) +[2024-08-15 20:34:04,573][00707] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 3198976. Throughput: 0: 853.8. Samples: 801040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:34:04,574][00707] Avg episode reward: [(0, '21.194')] +[2024-08-15 20:34:09,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3360.1). Total num frames: 3215360. Throughput: 0: 830.7. Samples: 804178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:34:09,577][00707] Avg episode reward: [(0, '21.846')] +[2024-08-15 20:34:09,585][03470] Saving new best policy, reward=21.846! +[2024-08-15 20:34:14,574][00707] Fps is (10 sec: 2866.9, 60 sec: 3140.2, 300 sec: 3332.3). Total num frames: 3227648. Throughput: 0: 815.2. Samples: 807888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:34:14,576][00707] Avg episode reward: [(0, '21.836')] +[2024-08-15 20:34:16,061][03483] Updated weights for policy 0, policy_version 790 (0.0035) +[2024-08-15 20:34:19,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3277.2, 300 sec: 3360.1). Total num frames: 3248128. Throughput: 0: 895.6. Samples: 813448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:34:19,581][00707] Avg episode reward: [(0, '23.069')] +[2024-08-15 20:34:19,584][03470] Saving new best policy, reward=23.069! +[2024-08-15 20:34:24,572][00707] Fps is (10 sec: 4096.5, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 3268608. Throughput: 0: 881.8. Samples: 816496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:34:24,581][00707] Avg episode reward: [(0, '21.716')] +[2024-08-15 20:34:25,808][03483] Updated weights for policy 0, policy_version 800 (0.0040) +[2024-08-15 20:34:29,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3345.4, 300 sec: 3360.1). Total num frames: 3284992. Throughput: 0: 877.7. Samples: 821604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:34:29,581][00707] Avg episode reward: [(0, '20.352')] +[2024-08-15 20:34:34,572][00707] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3297280. Throughput: 0: 902.2. Samples: 825816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:34:34,575][00707] Avg episode reward: [(0, '19.580')] +[2024-08-15 20:34:38,896][03483] Updated weights for policy 0, policy_version 810 (0.0041) +[2024-08-15 20:34:39,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 3317760. Throughput: 0: 855.9. Samples: 828892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:34:39,582][00707] Avg episode reward: [(0, '19.253')] +[2024-08-15 20:34:44,573][00707] Fps is (10 sec: 4095.8, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 3338240. Throughput: 0: 896.4. Samples: 834992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-15 20:34:44,580][00707] Avg episode reward: [(0, '19.623')] +[2024-08-15 20:34:49,575][00707] Fps is (10 sec: 3276.1, 60 sec: 3344.9, 300 sec: 3346.2). Total num frames: 3350528. Throughput: 0: 795.5. Samples: 836840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:34:49,581][00707] Avg episode reward: [(0, '19.794')] +[2024-08-15 20:34:51,901][03483] Updated weights for policy 0, policy_version 820 (0.0040) +[2024-08-15 20:34:54,572][00707] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3366912. Throughput: 0: 820.4. Samples: 841094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-15 20:34:54,575][00707] Avg episode reward: [(0, '21.046')] +[2024-08-15 20:34:59,572][00707] Fps is (10 sec: 4096.9, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 3391488. Throughput: 0: 877.8. Samples: 847390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:34:59,576][00707] Avg episode reward: [(0, '23.179')] +[2024-08-15 20:34:59,579][03470] Saving new best policy, reward=23.179! +[2024-08-15 20:35:01,591][03483] Updated weights for policy 0, policy_version 830 (0.0026) +[2024-08-15 20:35:04,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3403776. Throughput: 0: 865.7. Samples: 852404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:35:04,580][00707] Avg episode reward: [(0, '23.313')] +[2024-08-15 20:35:04,597][03470] Saving new best policy, reward=23.313! +[2024-08-15 20:35:09,572][00707] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3416064. Throughput: 0: 837.2. Samples: 854172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:35:09,578][00707] Avg episode reward: [(0, '22.899')] +[2024-08-15 20:35:14,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3374.0). Total num frames: 3436544. Throughput: 0: 843.6. Samples: 859568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:35:14,575][00707] Avg episode reward: [(0, '24.301')] +[2024-08-15 20:35:14,600][03470] Saving new best policy, reward=24.301! +[2024-08-15 20:35:14,905][03483] Updated weights for policy 0, policy_version 840 (0.0035) +[2024-08-15 20:35:19,574][00707] Fps is (10 sec: 4095.5, 60 sec: 3481.5, 300 sec: 3360.1). Total num frames: 3457024. Throughput: 0: 818.5. Samples: 862650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:35:19,578][00707] Avg episode reward: [(0, '24.107')] +[2024-08-15 20:35:24,574][00707] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3346.2). Total num frames: 3473408. Throughput: 0: 862.1. Samples: 867688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:35:24,578][00707] Avg episode reward: [(0, '23.942')] +[2024-08-15 20:35:24,587][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth... +[2024-08-15 20:35:24,764][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000650_2662400.pth +[2024-08-15 20:35:27,745][03483] Updated weights for policy 0, policy_version 850 (0.0046) +[2024-08-15 20:35:29,572][00707] Fps is (10 sec: 2867.6, 60 sec: 3345.1, 300 sec: 3346.3). Total num frames: 3485696. Throughput: 0: 820.6. Samples: 871920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:35:29,574][00707] Avg episode reward: [(0, '24.368')] +[2024-08-15 20:35:29,577][03470] Saving new best policy, reward=24.368! +[2024-08-15 20:35:34,572][00707] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 3506176. Throughput: 0: 847.8. Samples: 874988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:35:34,575][00707] Avg episode reward: [(0, '24.861')] +[2024-08-15 20:35:34,645][03470] Saving new best policy, reward=24.861! +[2024-08-15 20:35:37,562][03483] Updated weights for policy 0, policy_version 860 (0.0030) +[2024-08-15 20:35:39,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 3526656. Throughput: 0: 891.8. Samples: 881224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:35:39,578][00707] Avg episode reward: [(0, '24.614')] +[2024-08-15 20:35:44,573][00707] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3538944. Throughput: 0: 837.4. Samples: 885074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:35:44,575][00707] Avg episode reward: [(0, '24.395')] +[2024-08-15 20:35:49,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3374.0). Total num frames: 3559424. Throughput: 0: 778.5. Samples: 887438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:35:49,575][00707] Avg episode reward: [(0, '23.544')] +[2024-08-15 20:35:50,421][03483] Updated weights for policy 0, policy_version 870 (0.0028) +[2024-08-15 20:35:54,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 3579904. Throughput: 0: 884.0. Samples: 893950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:35:54,580][00707] Avg episode reward: [(0, '23.592')] +[2024-08-15 20:35:59,574][00707] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3387.9). Total num frames: 3592192. Throughput: 0: 868.7. Samples: 898662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-08-15 20:35:59,577][00707] Avg episode reward: [(0, '22.158')] +[2024-08-15 20:36:04,577][00707] Fps is (10 sec: 2047.0, 60 sec: 3276.5, 300 sec: 3360.1). Total num frames: 3600384. Throughput: 0: 834.6. Samples: 900208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-08-15 20:36:04,582][00707] Avg episode reward: [(0, '21.357')] +[2024-08-15 20:36:04,823][03483] Updated weights for policy 0, policy_version 880 (0.0035) +[2024-08-15 20:36:09,572][00707] Fps is (10 sec: 2458.0, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 3616768. Throughput: 0: 795.0. Samples: 903462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:09,579][00707] Avg episode reward: [(0, '21.288')] +[2024-08-15 20:36:14,574][00707] Fps is (10 sec: 3687.4, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 3637248. Throughput: 0: 840.4. Samples: 909740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:14,576][00707] Avg episode reward: [(0, '20.024')] +[2024-08-15 20:36:15,802][03483] Updated weights for policy 0, policy_version 890 (0.0029) +[2024-08-15 20:36:19,574][00707] Fps is (10 sec: 4095.5, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 3657728. Throughput: 0: 897.5. Samples: 915376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:19,576][00707] Avg episode reward: [(0, '19.982')] +[2024-08-15 20:36:24,573][00707] Fps is (10 sec: 3277.2, 60 sec: 3276.9, 300 sec: 3360.1). Total num frames: 3670016. Throughput: 0: 801.2. Samples: 917278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-08-15 20:36:24,581][00707] Avg episode reward: [(0, '19.274')] +[2024-08-15 20:36:28,201][03483] Updated weights for policy 0, policy_version 900 (0.0037) +[2024-08-15 20:36:29,572][00707] Fps is (10 sec: 3277.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 3690496. Throughput: 0: 833.0. Samples: 922558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:29,580][00707] Avg episode reward: [(0, '19.108')] +[2024-08-15 20:36:34,575][00707] Fps is (10 sec: 4095.4, 60 sec: 3413.2, 300 sec: 3387.9). Total num frames: 3710976. Throughput: 0: 850.6. Samples: 925718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:34,582][00707] Avg episode reward: [(0, '20.138')] +[2024-08-15 20:36:39,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 3723264. Throughput: 0: 826.3. Samples: 931132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:39,578][00707] Avg episode reward: [(0, '20.931')] +[2024-08-15 20:36:39,802][03483] Updated weights for policy 0, policy_version 910 (0.0023) +[2024-08-15 20:36:44,573][00707] Fps is (10 sec: 2867.7, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 3739648. Throughput: 0: 804.0. Samples: 934840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:36:44,580][00707] Avg episode reward: [(0, '22.344')] +[2024-08-15 20:36:49,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 3760128. Throughput: 0: 912.7. Samples: 941276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:36:49,578][00707] Avg episode reward: [(0, '22.164')] +[2024-08-15 20:36:50,943][03483] Updated weights for policy 0, policy_version 920 (0.0020) +[2024-08-15 20:36:54,577][00707] Fps is (10 sec: 4094.4, 60 sec: 3344.8, 300 sec: 3387.8). Total num frames: 3780608. Throughput: 0: 911.7. Samples: 944494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-08-15 20:36:54,581][00707] Avg episode reward: [(0, '23.048')] +[2024-08-15 20:36:59,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3374.0). Total num frames: 3792896. Throughput: 0: 867.6. Samples: 948780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:36:59,577][00707] Avg episode reward: [(0, '21.503')] +[2024-08-15 20:37:03,492][03483] Updated weights for policy 0, policy_version 930 (0.0059) +[2024-08-15 20:37:04,572][00707] Fps is (10 sec: 3278.2, 60 sec: 3550.1, 300 sec: 3401.8). Total num frames: 3813376. Throughput: 0: 787.4. Samples: 950806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:37:04,578][00707] Avg episode reward: [(0, '20.313')] +[2024-08-15 20:37:09,572][00707] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 3833856. Throughput: 0: 887.8. Samples: 957230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-08-15 20:37:09,579][00707] Avg episode reward: [(0, '18.663')] +[2024-08-15 20:37:14,520][03483] Updated weights for policy 0, policy_version 940 (0.0040) +[2024-08-15 20:37:14,572][00707] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3387.9). Total num frames: 3850240. Throughput: 0: 891.0. Samples: 962654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:14,575][00707] Avg episode reward: [(0, '19.576')] +[2024-08-15 20:37:19,573][00707] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 3862528. Throughput: 0: 863.2. Samples: 964560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:19,579][00707] Avg episode reward: [(0, '18.900')] +[2024-08-15 20:37:24,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3883008. Throughput: 0: 857.9. Samples: 969736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:24,577][00707] Avg episode reward: [(0, '20.156')] +[2024-08-15 20:37:24,589][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000948_3883008.pth... +[2024-08-15 20:37:24,718][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000750_3072000.pth +[2024-08-15 20:37:26,175][03483] Updated weights for policy 0, policy_version 950 (0.0038) +[2024-08-15 20:37:29,572][00707] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3903488. Throughput: 0: 916.2. Samples: 976068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:29,578][00707] Avg episode reward: [(0, '21.277')] +[2024-08-15 20:37:34,577][00707] Fps is (10 sec: 3275.2, 60 sec: 3413.2, 300 sec: 3373.9). Total num frames: 3915776. Throughput: 0: 870.5. Samples: 980454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:34,580][00707] Avg episode reward: [(0, '22.366')] +[2024-08-15 20:37:38,937][03483] Updated weights for policy 0, policy_version 960 (0.0043) +[2024-08-15 20:37:39,573][00707] Fps is (10 sec: 2867.0, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 3932160. Throughput: 0: 841.9. Samples: 982376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:39,576][00707] Avg episode reward: [(0, '23.350')] +[2024-08-15 20:37:44,572][00707] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3952640. Throughput: 0: 885.8. Samples: 988642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-08-15 20:37:44,575][00707] Avg episode reward: [(0, '22.682')] +[2024-08-15 20:37:49,017][03483] Updated weights for policy 0, policy_version 970 (0.0019) +[2024-08-15 20:37:49,572][00707] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3973120. Throughput: 0: 910.7. Samples: 991786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:49,575][00707] Avg episode reward: [(0, '23.608')] +[2024-08-15 20:37:54,572][00707] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 3387.9). Total num frames: 3985408. Throughput: 0: 862.3. Samples: 996032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-08-15 20:37:54,576][00707] Avg episode reward: [(0, '23.267')] +[2024-08-15 20:37:59,378][03470] Stopping Batcher_0... +[2024-08-15 20:37:59,380][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-15 20:37:59,379][00707] Component Batcher_0 stopped! +[2024-08-15 20:37:59,381][03470] Loop batcher_evt_loop terminating... +[2024-08-15 20:37:59,427][00707] Component RolloutWorker_w4 stopped! +[2024-08-15 20:37:59,434][03489] Stopping RolloutWorker_w4... +[2024-08-15 20:37:59,435][03489] Loop rollout_proc4_evt_loop terminating... +[2024-08-15 20:37:59,460][00707] Component RolloutWorker_w6 stopped! +[2024-08-15 20:37:59,468][00707] Component RolloutWorker_w1 stopped! +[2024-08-15 20:37:59,462][03485] Stopping RolloutWorker_w1... +[2024-08-15 20:37:59,469][03490] Stopping RolloutWorker_w6... +[2024-08-15 20:37:59,478][00707] Component RolloutWorker_w0 stopped! +[2024-08-15 20:37:59,481][03487] Stopping RolloutWorker_w3... +[2024-08-15 20:37:59,484][00707] Component RolloutWorker_w3 stopped! +[2024-08-15 20:37:59,486][03485] Loop rollout_proc1_evt_loop terminating... +[2024-08-15 20:37:59,491][03487] Loop rollout_proc3_evt_loop terminating... +[2024-08-15 20:37:59,490][03484] Stopping RolloutWorker_w0... +[2024-08-15 20:37:59,475][03490] Loop rollout_proc6_evt_loop terminating... +[2024-08-15 20:37:59,492][03484] Loop rollout_proc0_evt_loop terminating... +[2024-08-15 20:37:59,495][03483] Weights refcount: 2 0 +[2024-08-15 20:37:59,512][03483] Stopping InferenceWorker_p0-w0... +[2024-08-15 20:37:59,513][03483] Loop inference_proc0-0_evt_loop terminating... +[2024-08-15 20:37:59,508][00707] Component RolloutWorker_w2 stopped! +[2024-08-15 20:37:59,515][00707] Component InferenceWorker_p0-w0 stopped! +[2024-08-15 20:37:59,524][03491] Stopping RolloutWorker_w7... +[2024-08-15 20:37:59,525][03491] Loop rollout_proc7_evt_loop terminating... +[2024-08-15 20:37:59,527][03486] Stopping RolloutWorker_w2... +[2024-08-15 20:37:59,528][03486] Loop rollout_proc2_evt_loop terminating... +[2024-08-15 20:37:59,527][00707] Component RolloutWorker_w7 stopped! +[2024-08-15 20:37:59,594][03488] Stopping RolloutWorker_w5... +[2024-08-15 20:37:59,599][03488] Loop rollout_proc5_evt_loop terminating... +[2024-08-15 20:37:59,594][00707] Component RolloutWorker_w5 stopped! +[2024-08-15 20:37:59,605][03470] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth +[2024-08-15 20:37:59,629][03470] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-15 20:37:59,818][00707] Component LearnerWorker_p0 stopped! +[2024-08-15 20:37:59,823][00707] Waiting for process learner_proc0 to stop... +[2024-08-15 20:37:59,825][03470] Stopping LearnerWorker_p0... +[2024-08-15 20:37:59,827][03470] Loop learner_proc0_evt_loop terminating... +[2024-08-15 20:38:01,415][00707] Waiting for process inference_proc0-0 to join... +[2024-08-15 20:38:01,421][00707] Waiting for process rollout_proc0 to join... +[2024-08-15 20:38:03,089][00707] Waiting for process rollout_proc1 to join... +[2024-08-15 20:38:03,251][00707] Waiting for process rollout_proc2 to join... +[2024-08-15 20:38:03,258][00707] Waiting for process rollout_proc3 to join... +[2024-08-15 20:38:03,261][00707] Waiting for process rollout_proc4 to join... +[2024-08-15 20:38:03,266][00707] Waiting for process rollout_proc5 to join... +[2024-08-15 20:38:03,271][00707] Waiting for process rollout_proc6 to join... +[2024-08-15 20:38:03,275][00707] Waiting for process rollout_proc7 to join... +[2024-08-15 20:38:03,279][00707] Batcher 0 profile tree view: +batching: 29.4565, releasing_batches: 0.0323 +[2024-08-15 20:38:03,282][00707] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 456.4398 +update_model: 10.0442 + weight_update: 0.0032 +one_step: 0.0135 + handle_policy_step: 691.2505 + deserialize: 17.6000, stack: 3.3830, obs_to_device_normalize: 138.0813, forward: 370.5980, send_messages: 34.1369 + prepare_outputs: 92.4922 + to_cpu: 53.1042 +[2024-08-15 20:38:03,284][00707] Learner 0 profile tree view: +misc: 0.0062, prepare_batch: 16.2412 +train: 78.6321 + epoch_init: 0.0109, minibatch_init: 0.0132, losses_postprocess: 0.7542, kl_divergence: 0.8164, after_optimizer: 34.2249 + calculate_losses: 29.7311 + losses_init: 0.0050, forward_head: 1.7408, bptt_initial: 19.6448, tail: 1.2918, advantages_returns: 0.2946, losses: 3.9830 + bptt: 2.4163 + bptt_forward_core: 2.3109 + update: 12.3512 + clip: 1.0309 +[2024-08-15 20:38:03,286][00707] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4880, enqueue_policy_requests: 118.9660, env_step: 927.5259, overhead: 16.8053, complete_rollouts: 8.8667 +save_policy_outputs: 25.3622 + split_output_tensors: 10.3231 +[2024-08-15 20:38:03,288][00707] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4361, enqueue_policy_requests: 125.1801, env_step: 928.5158, overhead: 16.5994, complete_rollouts: 7.5925 +save_policy_outputs: 24.5104 + split_output_tensors: 10.0572 +[2024-08-15 20:38:03,290][00707] Loop Runner_EvtLoop terminating... +[2024-08-15 20:38:03,292][00707] Runner profile tree view: +main_loop: 1234.1837 +[2024-08-15 20:38:03,293][00707] Collected {0: 4005888}, FPS: 3245.8 +[2024-08-15 20:38:03,718][00707] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-15 20:38:03,721][00707] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-15 20:38:03,724][00707] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-15 20:38:03,726][00707] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-15 20:38:03,727][00707] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-15 20:38:03,731][00707] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-15 20:38:03,732][00707] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-08-15 20:38:03,733][00707] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-15 20:38:03,736][00707] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-08-15 20:38:03,737][00707] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-08-15 20:38:03,738][00707] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-15 20:38:03,739][00707] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-15 20:38:03,740][00707] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-15 20:38:03,742][00707] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-15 20:38:03,743][00707] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-15 20:38:03,778][00707] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-15 20:38:03,783][00707] RunningMeanStd input shape: (3, 72, 128) +[2024-08-15 20:38:03,785][00707] RunningMeanStd input shape: (1,) +[2024-08-15 20:38:03,805][00707] ConvEncoder: input_channels=3 +[2024-08-15 20:38:03,923][00707] Conv encoder output size: 512 +[2024-08-15 20:38:03,926][00707] Policy head output size: 512 +[2024-08-15 20:38:04,132][00707] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-15 20:38:04,931][00707] Num frames 100... +[2024-08-15 20:38:05,064][00707] Num frames 200... +[2024-08-15 20:38:05,186][00707] Num frames 300... +[2024-08-15 20:38:05,314][00707] Num frames 400... +[2024-08-15 20:38:05,451][00707] Num frames 500... +[2024-08-15 20:38:05,634][00707] Num frames 600... +[2024-08-15 20:38:05,812][00707] Num frames 700... +[2024-08-15 20:38:06,042][00707] Avg episode rewards: #0: 16.850, true rewards: #0: 7.850 +[2024-08-15 20:38:06,044][00707] Avg episode reward: 16.850, avg true_objective: 7.850 +[2024-08-15 20:38:06,082][00707] Num frames 800... +[2024-08-15 20:38:06,260][00707] Num frames 900... +[2024-08-15 20:38:06,452][00707] Num frames 1000... +[2024-08-15 20:38:06,627][00707] Num frames 1100... +[2024-08-15 20:38:06,814][00707] Num frames 1200... +[2024-08-15 20:38:06,994][00707] Avg episode rewards: #0: 13.325, true rewards: #0: 6.325 +[2024-08-15 20:38:06,997][00707] Avg episode reward: 13.325, avg true_objective: 6.325 +[2024-08-15 20:38:07,062][00707] Num frames 1300... +[2024-08-15 20:38:07,265][00707] Num frames 1400... +[2024-08-15 20:38:07,451][00707] Num frames 1500... +[2024-08-15 20:38:07,632][00707] Num frames 1600... +[2024-08-15 20:38:07,824][00707] Num frames 1700... +[2024-08-15 20:38:08,017][00707] Num frames 1800... +[2024-08-15 20:38:08,152][00707] Num frames 1900... +[2024-08-15 20:38:08,255][00707] Avg episode rewards: #0: 14.770, true rewards: #0: 6.437 +[2024-08-15 20:38:08,256][00707] Avg episode reward: 14.770, avg true_objective: 6.437 +[2024-08-15 20:38:08,347][00707] Num frames 2000... +[2024-08-15 20:38:08,473][00707] Num frames 2100... +[2024-08-15 20:38:08,599][00707] Num frames 2200... +[2024-08-15 20:38:08,728][00707] Num frames 2300... +[2024-08-15 20:38:08,803][00707] Avg episode rewards: #0: 12.038, true rewards: #0: 5.787 +[2024-08-15 20:38:08,805][00707] Avg episode reward: 12.038, avg true_objective: 5.787 +[2024-08-15 20:38:08,928][00707] Num frames 2400... +[2024-08-15 20:38:09,058][00707] Num frames 2500... +[2024-08-15 20:38:09,183][00707] Num frames 2600... +[2024-08-15 20:38:09,319][00707] Num frames 2700... +[2024-08-15 20:38:09,442][00707] Num frames 2800... +[2024-08-15 20:38:09,562][00707] Num frames 2900... +[2024-08-15 20:38:09,720][00707] Avg episode rewards: #0: 12.374, true rewards: #0: 5.974 +[2024-08-15 20:38:09,721][00707] Avg episode reward: 12.374, avg true_objective: 5.974 +[2024-08-15 20:38:09,742][00707] Num frames 3000... +[2024-08-15 20:38:09,863][00707] Num frames 3100... +[2024-08-15 20:38:09,992][00707] Num frames 3200... +[2024-08-15 20:38:10,121][00707] Num frames 3300... +[2024-08-15 20:38:10,253][00707] Num frames 3400... +[2024-08-15 20:38:10,381][00707] Num frames 3500... +[2024-08-15 20:38:10,523][00707] Avg episode rewards: #0: 12.105, true rewards: #0: 5.938 +[2024-08-15 20:38:10,524][00707] Avg episode reward: 12.105, avg true_objective: 5.938 +[2024-08-15 20:38:10,576][00707] Num frames 3600... +[2024-08-15 20:38:10,706][00707] Num frames 3700... +[2024-08-15 20:38:10,833][00707] Num frames 3800... +[2024-08-15 20:38:10,968][00707] Num frames 3900... +[2024-08-15 20:38:11,093][00707] Num frames 4000... +[2024-08-15 20:38:11,222][00707] Num frames 4100... +[2024-08-15 20:38:11,367][00707] Num frames 4200... +[2024-08-15 20:38:11,507][00707] Num frames 4300... +[2024-08-15 20:38:11,637][00707] Num frames 4400... +[2024-08-15 20:38:11,764][00707] Num frames 4500... +[2024-08-15 20:38:11,903][00707] Num frames 4600... +[2024-08-15 20:38:12,041][00707] Num frames 4700... +[2024-08-15 20:38:12,170][00707] Num frames 4800... +[2024-08-15 20:38:12,317][00707] Num frames 4900... +[2024-08-15 20:38:12,455][00707] Num frames 5000... +[2024-08-15 20:38:12,585][00707] Num frames 5100... +[2024-08-15 20:38:12,751][00707] Num frames 5200... +[2024-08-15 20:38:12,881][00707] Num frames 5300... +[2024-08-15 20:38:13,015][00707] Num frames 5400... +[2024-08-15 20:38:13,142][00707] Num frames 5500... +[2024-08-15 20:38:13,271][00707] Num frames 5600... +[2024-08-15 20:38:13,416][00707] Avg episode rewards: #0: 18.947, true rewards: #0: 8.090 +[2024-08-15 20:38:13,418][00707] Avg episode reward: 18.947, avg true_objective: 8.090 +[2024-08-15 20:38:13,467][00707] Num frames 5700... +[2024-08-15 20:38:13,604][00707] Num frames 5800... +[2024-08-15 20:38:13,744][00707] Num frames 5900... +[2024-08-15 20:38:13,871][00707] Num frames 6000... +[2024-08-15 20:38:14,003][00707] Num frames 6100... +[2024-08-15 20:38:14,136][00707] Num frames 6200... +[2024-08-15 20:38:14,266][00707] Num frames 6300... +[2024-08-15 20:38:14,400][00707] Num frames 6400... +[2024-08-15 20:38:14,530][00707] Num frames 6500... +[2024-08-15 20:38:14,664][00707] Num frames 6600... +[2024-08-15 20:38:14,801][00707] Num frames 6700... +[2024-08-15 20:38:14,940][00707] Num frames 6800... +[2024-08-15 20:38:15,102][00707] Avg episode rewards: #0: 20.474, true rewards: #0: 8.599 +[2024-08-15 20:38:15,103][00707] Avg episode reward: 20.474, avg true_objective: 8.599 +[2024-08-15 20:38:15,133][00707] Num frames 6900... +[2024-08-15 20:38:15,255][00707] Num frames 7000... +[2024-08-15 20:38:15,386][00707] Num frames 7100... +[2024-08-15 20:38:15,509][00707] Num frames 7200... +[2024-08-15 20:38:15,640][00707] Avg episode rewards: #0: 18.625, true rewards: #0: 8.070 +[2024-08-15 20:38:15,642][00707] Avg episode reward: 18.625, avg true_objective: 8.070 +[2024-08-15 20:38:15,691][00707] Num frames 7300... +[2024-08-15 20:38:15,814][00707] Num frames 7400... +[2024-08-15 20:38:15,950][00707] Num frames 7500... +[2024-08-15 20:38:16,074][00707] Num frames 7600... +[2024-08-15 20:38:16,195][00707] Num frames 7700... +[2024-08-15 20:38:16,321][00707] Num frames 7800... +[2024-08-15 20:38:16,489][00707] Avg episode rewards: #0: 17.671, true rewards: #0: 7.871 +[2024-08-15 20:38:16,492][00707] Avg episode reward: 17.671, avg true_objective: 7.871 +[2024-08-15 20:39:12,987][00707] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-08-15 20:59:02,286][00707] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-15 20:59:02,288][00707] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-15 20:59:02,290][00707] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-15 20:59:02,292][00707] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-15 20:59:02,294][00707] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-15 20:59:02,296][00707] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-15 20:59:02,298][00707] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-08-15 20:59:02,299][00707] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-15 20:59:02,300][00707] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-08-15 20:59:02,301][00707] Adding new argument 'hf_repository'='ib1368/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-08-15 20:59:02,302][00707] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-15 20:59:02,303][00707] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-15 20:59:02,304][00707] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-15 20:59:02,305][00707] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-15 20:59:02,306][00707] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-15 20:59:02,340][00707] RunningMeanStd input shape: (3, 72, 128) +[2024-08-15 20:59:02,342][00707] RunningMeanStd input shape: (1,) +[2024-08-15 20:59:02,358][00707] ConvEncoder: input_channels=3 +[2024-08-15 20:59:02,401][00707] Conv encoder output size: 512 +[2024-08-15 20:59:02,403][00707] Policy head output size: 512 +[2024-08-15 20:59:02,426][00707] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-15 20:59:02,891][00707] Num frames 100... +[2024-08-15 20:59:03,042][00707] Num frames 200... +[2024-08-15 20:59:03,172][00707] Num frames 300... +[2024-08-15 20:59:03,299][00707] Num frames 400... +[2024-08-15 20:59:03,431][00707] Num frames 500... +[2024-08-15 20:59:03,568][00707] Num frames 600... +[2024-08-15 20:59:03,693][00707] Num frames 700... +[2024-08-15 20:59:03,822][00707] Num frames 800... +[2024-08-15 20:59:03,966][00707] Avg episode rewards: #0: 16.640, true rewards: #0: 8.640 +[2024-08-15 20:59:03,968][00707] Avg episode reward: 16.640, avg true_objective: 8.640 +[2024-08-15 20:59:04,031][00707] Num frames 900... +[2024-08-15 20:59:04,162][00707] Num frames 1000... +[2024-08-15 20:59:04,291][00707] Num frames 1100... +[2024-08-15 20:59:04,415][00707] Num frames 1200... +[2024-08-15 20:59:04,541][00707] Num frames 1300... +[2024-08-15 20:59:04,664][00707] Num frames 1400... +[2024-08-15 20:59:04,736][00707] Avg episode rewards: #0: 13.540, true rewards: #0: 7.040 +[2024-08-15 20:59:04,738][00707] Avg episode reward: 13.540, avg true_objective: 7.040 +[2024-08-15 20:59:04,853][00707] Num frames 1500... +[2024-08-15 20:59:04,987][00707] Num frames 1600... +[2024-08-15 20:59:05,125][00707] Num frames 1700... +[2024-08-15 20:59:05,251][00707] Num frames 1800... +[2024-08-15 20:59:05,381][00707] Num frames 1900... +[2024-08-15 20:59:05,507][00707] Num frames 2000... +[2024-08-15 20:59:05,622][00707] Avg episode rewards: #0: 13.160, true rewards: #0: 6.827 +[2024-08-15 20:59:05,624][00707] Avg episode reward: 13.160, avg true_objective: 6.827 +[2024-08-15 20:59:05,691][00707] Num frames 2100... +[2024-08-15 20:59:05,817][00707] Num frames 2200... +[2024-08-15 20:59:05,958][00707] Num frames 2300... +[2024-08-15 20:59:06,092][00707] Num frames 2400... +[2024-08-15 20:59:06,218][00707] Num frames 2500... +[2024-08-15 20:59:06,355][00707] Num frames 2600... +[2024-08-15 20:59:06,447][00707] Avg episode rewards: #0: 12.310, true rewards: #0: 6.560 +[2024-08-15 20:59:06,448][00707] Avg episode reward: 12.310, avg true_objective: 6.560 +[2024-08-15 20:59:06,545][00707] Num frames 2700... +[2024-08-15 20:59:06,671][00707] Num frames 2800... +[2024-08-15 20:59:06,795][00707] Num frames 2900... +[2024-08-15 20:59:06,931][00707] Num frames 3000... +[2024-08-15 20:59:07,118][00707] Num frames 3100... +[2024-08-15 20:59:07,301][00707] Num frames 3200... +[2024-08-15 20:59:07,472][00707] Num frames 3300... +[2024-08-15 20:59:07,645][00707] Num frames 3400... +[2024-08-15 20:59:07,831][00707] Num frames 3500... +[2024-08-15 20:59:08,000][00707] Num frames 3600... +[2024-08-15 20:59:08,185][00707] Num frames 3700... +[2024-08-15 20:59:08,365][00707] Num frames 3800... +[2024-08-15 20:59:08,547][00707] Num frames 3900... +[2024-08-15 20:59:08,730][00707] Num frames 4000... +[2024-08-15 20:59:08,820][00707] Avg episode rewards: #0: 16.636, true rewards: #0: 8.036 +[2024-08-15 20:59:08,822][00707] Avg episode reward: 16.636, avg true_objective: 8.036 +[2024-08-15 20:59:08,968][00707] Num frames 4100... +[2024-08-15 20:59:09,152][00707] Num frames 4200... +[2024-08-15 20:59:09,344][00707] Num frames 4300... +[2024-08-15 20:59:09,532][00707] Num frames 4400... +[2024-08-15 20:59:09,718][00707] Num frames 4500... +[2024-08-15 20:59:09,843][00707] Num frames 4600... +[2024-08-15 20:59:09,981][00707] Num frames 4700... +[2024-08-15 20:59:10,161][00707] Avg episode rewards: #0: 17.160, true rewards: #0: 7.993 +[2024-08-15 20:59:10,162][00707] Avg episode reward: 17.160, avg true_objective: 7.993 +[2024-08-15 20:59:10,172][00707] Num frames 4800... +[2024-08-15 20:59:10,307][00707] Num frames 4900... +[2024-08-15 20:59:10,438][00707] Num frames 5000... +[2024-08-15 20:59:10,564][00707] Num frames 5100... +[2024-08-15 20:59:10,695][00707] Num frames 5200... +[2024-08-15 20:59:10,828][00707] Num frames 5300... +[2024-08-15 20:59:10,967][00707] Num frames 5400... +[2024-08-15 20:59:11,099][00707] Num frames 5500... +[2024-08-15 20:59:11,225][00707] Num frames 5600... +[2024-08-15 20:59:11,362][00707] Num frames 5700... +[2024-08-15 20:59:11,488][00707] Num frames 5800... +[2024-08-15 20:59:11,617][00707] Num frames 5900... +[2024-08-15 20:59:11,745][00707] Num frames 6000... +[2024-08-15 20:59:11,868][00707] Num frames 6100... +[2024-08-15 20:59:12,006][00707] Num frames 6200... +[2024-08-15 20:59:12,139][00707] Num frames 6300... +[2024-08-15 20:59:12,265][00707] Num frames 6400... +[2024-08-15 20:59:12,405][00707] Num frames 6500... +[2024-08-15 20:59:12,534][00707] Num frames 6600... +[2024-08-15 20:59:12,661][00707] Avg episode rewards: #0: 20.931, true rewards: #0: 9.503 +[2024-08-15 20:59:12,663][00707] Avg episode reward: 20.931, avg true_objective: 9.503 +[2024-08-15 20:59:12,725][00707] Num frames 6700... +[2024-08-15 20:59:12,854][00707] Num frames 6800... +[2024-08-15 20:59:12,993][00707] Num frames 6900... +[2024-08-15 20:59:13,128][00707] Num frames 7000... +[2024-08-15 20:59:13,256][00707] Num frames 7100... +[2024-08-15 20:59:13,353][00707] Avg episode rewards: #0: 19.165, true rewards: #0: 8.915 +[2024-08-15 20:59:13,354][00707] Avg episode reward: 19.165, avg true_objective: 8.915 +[2024-08-15 20:59:13,453][00707] Num frames 7200... +[2024-08-15 20:59:13,578][00707] Num frames 7300... +[2024-08-15 20:59:13,709][00707] Num frames 7400... +[2024-08-15 20:59:13,842][00707] Num frames 7500... +[2024-08-15 20:59:13,976][00707] Num frames 7600... +[2024-08-15 20:59:14,102][00707] Num frames 7700... +[2024-08-15 20:59:14,231][00707] Num frames 7800... +[2024-08-15 20:59:14,366][00707] Num frames 7900... +[2024-08-15 20:59:14,504][00707] Num frames 8000... +[2024-08-15 20:59:14,635][00707] Num frames 8100... +[2024-08-15 20:59:14,762][00707] Num frames 8200... +[2024-08-15 20:59:14,895][00707] Num frames 8300... +[2024-08-15 20:59:15,030][00707] Num frames 8400... +[2024-08-15 20:59:15,160][00707] Num frames 8500... +[2024-08-15 20:59:15,284][00707] Num frames 8600... +[2024-08-15 20:59:15,415][00707] Num frames 8700... +[2024-08-15 20:59:15,557][00707] Num frames 8800... +[2024-08-15 20:59:15,686][00707] Num frames 8900... +[2024-08-15 20:59:15,805][00707] Avg episode rewards: #0: 22.052, true rewards: #0: 9.941 +[2024-08-15 20:59:15,807][00707] Avg episode reward: 22.052, avg true_objective: 9.941 +[2024-08-15 20:59:15,880][00707] Num frames 9000... +[2024-08-15 20:59:16,015][00707] Num frames 9100... +[2024-08-15 20:59:16,143][00707] Num frames 9200... +[2024-08-15 20:59:16,270][00707] Num frames 9300... +[2024-08-15 20:59:16,400][00707] Num frames 9400... +[2024-08-15 20:59:16,533][00707] Num frames 9500... +[2024-08-15 20:59:16,655][00707] Num frames 9600... +[2024-08-15 20:59:16,782][00707] Num frames 9700... +[2024-08-15 20:59:16,916][00707] Num frames 9800... +[2024-08-15 20:59:17,046][00707] Num frames 9900... +[2024-08-15 20:59:17,176][00707] Num frames 10000... +[2024-08-15 20:59:17,311][00707] Num frames 10100... +[2024-08-15 20:59:17,444][00707] Num frames 10200... +[2024-08-15 20:59:17,586][00707] Num frames 10300... +[2024-08-15 20:59:17,720][00707] Num frames 10400... +[2024-08-15 20:59:17,853][00707] Num frames 10500... +[2024-08-15 20:59:17,986][00707] Num frames 10600... +[2024-08-15 20:59:18,115][00707] Num frames 10700... +[2024-08-15 20:59:18,181][00707] Avg episode rewards: #0: 24.207, true rewards: #0: 10.707 +[2024-08-15 20:59:18,183][00707] Avg episode reward: 24.207, avg true_objective: 10.707 +[2024-08-15 21:00:32,522][00707] Replay video saved to /content/train_dir/default_experiment/replay.mp4!