diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,39 @@ -[2024-09-29 15:49:05,181][00191] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-09-29 15:49:05,184][00191] Rollout worker 0 uses device cpu -[2024-09-29 15:49:05,185][00191] Rollout worker 1 uses device cpu -[2024-09-29 15:49:05,187][00191] Rollout worker 2 uses device cpu -[2024-09-29 15:49:05,188][00191] Rollout worker 3 uses device cpu -[2024-09-29 15:49:05,189][00191] Rollout worker 4 uses device cpu -[2024-09-29 15:49:05,190][00191] Rollout worker 5 uses device cpu -[2024-09-29 15:49:05,192][00191] Rollout worker 6 uses device cpu -[2024-09-29 15:49:05,193][00191] Rollout worker 7 uses device cpu -[2024-09-29 15:49:05,347][00191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-29 15:49:05,349][00191] InferenceWorker_p0-w0: min num requests: 2 -[2024-09-29 15:49:05,386][00191] Starting all processes... -[2024-09-29 15:49:05,388][00191] Starting process learner_proc0 -[2024-09-29 15:49:06,048][00191] Starting all processes... -[2024-09-29 15:49:06,058][00191] Starting process inference_proc0-0 -[2024-09-29 15:49:06,059][00191] Starting process rollout_proc0 -[2024-09-29 15:49:06,060][00191] Starting process rollout_proc1 -[2024-09-29 15:49:06,061][00191] Starting process rollout_proc2 -[2024-09-29 15:49:06,061][00191] Starting process rollout_proc3 -[2024-09-29 15:49:06,061][00191] Starting process rollout_proc4 -[2024-09-29 15:49:06,061][00191] Starting process rollout_proc5 -[2024-09-29 15:49:06,061][00191] Starting process rollout_proc6 -[2024-09-29 15:49:06,061][00191] Starting process rollout_proc7 -[2024-09-29 15:49:21,749][05174] Worker 6 uses CPU cores [0] -[2024-09-29 15:49:21,915][05169] Worker 3 uses CPU cores [1] -[2024-09-29 15:49:21,949][05170] Worker 2 uses CPU cores [0] -[2024-09-29 15:49:22,011][05167] Worker 1 uses CPU cores [1] -[2024-09-29 15:49:22,077][05153] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-29 15:49:22,083][05153] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-09-29 15:49:22,124][05153] Num visible devices: 1 -[2024-09-29 15:49:22,164][05153] Starting seed is not provided -[2024-09-29 15:49:22,165][05153] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-29 15:49:22,165][05153] Initializing actor-critic model on device cuda:0 -[2024-09-29 15:49:22,166][05153] RunningMeanStd input shape: (3, 72, 128) -[2024-09-29 15:49:22,169][05153] RunningMeanStd input shape: (1,) -[2024-09-29 15:49:22,203][05166] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-29 15:49:22,204][05166] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-09-29 15:49:22,239][05153] ConvEncoder: input_channels=3 -[2024-09-29 15:49:22,242][05166] Num visible devices: 1 -[2024-09-29 15:49:22,299][05172] Worker 5 uses CPU cores [1] -[2024-09-29 15:49:22,317][05173] Worker 7 uses CPU cores [1] -[2024-09-29 15:49:22,323][05168] Worker 0 uses CPU cores [0] -[2024-09-29 15:49:22,345][05171] Worker 4 uses CPU cores [0] -[2024-09-29 15:49:22,506][05153] Conv encoder output size: 512 -[2024-09-29 15:49:22,506][05153] Policy head output size: 512 -[2024-09-29 15:49:22,562][05153] Created Actor Critic model with architecture: -[2024-09-29 15:49:22,563][05153] ActorCriticSharedWeights( +[2024-09-30 00:25:00,956][1148693] Saving configuration to /home/luyang/workspace/rl/train_dir/default_experiment/config.json... +[2024-09-30 00:25:00,961][1148693] Rollout worker 0 uses device cpu +[2024-09-30 00:25:00,961][1148693] Rollout worker 1 uses device cpu +[2024-09-30 00:25:00,961][1148693] Rollout worker 2 uses device cpu +[2024-09-30 00:25:00,961][1148693] Rollout worker 3 uses device cpu +[2024-09-30 00:25:00,961][1148693] Rollout worker 4 uses device cpu +[2024-09-30 00:25:00,961][1148693] Rollout worker 5 uses device cpu +[2024-09-30 00:25:00,961][1148693] Rollout worker 6 uses device cpu +[2024-09-30 00:25:00,962][1148693] Rollout worker 7 uses device cpu +[2024-09-30 00:25:01,008][1148693] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:25:01,008][1148693] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-30 00:25:01,042][1148693] Starting all processes... +[2024-09-30 00:25:01,042][1148693] Starting process learner_proc0 +[2024-09-30 00:25:02,676][1148693] Starting all processes... +[2024-09-30 00:25:02,680][1148981] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:25:02,680][1148981] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-30 00:25:02,680][1148693] Starting process inference_proc0-0 +[2024-09-30 00:25:02,680][1148693] Starting process rollout_proc0 +[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc1 +[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc2 +[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc3 +[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc4 +[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc5 +[2024-09-30 00:25:02,686][1148693] Starting process rollout_proc6 +[2024-09-30 00:25:02,686][1148693] Starting process rollout_proc7 +[2024-09-30 00:25:02,712][1148981] Num visible devices: 1 +[2024-09-30 00:25:02,719][1148981] Starting seed is not provided +[2024-09-30 00:25:02,719][1148981] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:25:02,719][1148981] Initializing actor-critic model on device cuda:0 +[2024-09-30 00:25:02,719][1148981] RunningMeanStd input shape: (3, 72, 128) +[2024-09-30 00:25:02,720][1148981] RunningMeanStd input shape: (1,) +[2024-09-30 00:25:02,729][1148981] ConvEncoder: input_channels=3 +[2024-09-30 00:25:02,801][1148981] Conv encoder output size: 512 +[2024-09-30 00:25:02,801][1148981] Policy head output size: 512 +[2024-09-30 00:25:02,812][1148981] Created Actor Critic model with architecture: +[2024-09-30 00:25:02,813][1148981] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,1004 +74,575 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2024-09-29 15:49:22,944][05153] Using optimizer -[2024-09-29 15:49:23,600][05153] No checkpoints found -[2024-09-29 15:49:23,600][05153] Did not load from checkpoint, starting from scratch! -[2024-09-29 15:49:23,601][05153] Initialized policy 0 weights for model version 0 -[2024-09-29 15:49:23,605][05153] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-29 15:49:23,611][05153] LearnerWorker_p0 finished initialization! -[2024-09-29 15:49:23,796][05166] RunningMeanStd input shape: (3, 72, 128) -[2024-09-29 15:49:23,797][05166] RunningMeanStd input shape: (1,) -[2024-09-29 15:49:23,809][05166] ConvEncoder: input_channels=3 -[2024-09-29 15:49:23,910][05166] Conv encoder output size: 512 -[2024-09-29 15:49:23,911][05166] Policy head output size: 512 -[2024-09-29 15:49:23,961][00191] Inference worker 0-0 is ready! -[2024-09-29 15:49:23,962][00191] All inference workers are ready! Signal rollout workers to start! -[2024-09-29 15:49:24,158][05172] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,153][05167] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,163][05174] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,162][05169] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,163][05173] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,165][05168] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,157][05171] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,161][05170] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 15:49:24,931][00191] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-09-29 15:49:25,182][05168] Decorrelating experience for 0 frames... -[2024-09-29 15:49:25,184][05174] Decorrelating experience for 0 frames... -[2024-09-29 15:49:25,340][00191] Heartbeat connected on Batcher_0 -[2024-09-29 15:49:25,345][00191] Heartbeat connected on LearnerWorker_p0 -[2024-09-29 15:49:25,374][00191] Heartbeat connected on InferenceWorker_p0-w0 -[2024-09-29 15:49:25,525][05172] Decorrelating experience for 0 frames... -[2024-09-29 15:49:25,525][05169] Decorrelating experience for 0 frames... -[2024-09-29 15:49:25,529][05167] Decorrelating experience for 0 frames... -[2024-09-29 15:49:25,915][05167] Decorrelating experience for 32 frames... -[2024-09-29 15:49:26,317][05168] Decorrelating experience for 32 frames... -[2024-09-29 15:49:26,447][05171] Decorrelating experience for 0 frames... -[2024-09-29 15:49:26,480][05167] Decorrelating experience for 64 frames... -[2024-09-29 15:49:27,023][05174] Decorrelating experience for 32 frames... -[2024-09-29 15:49:27,038][05170] Decorrelating experience for 0 frames... -[2024-09-29 15:49:28,085][05172] Decorrelating experience for 32 frames... -[2024-09-29 15:49:28,144][05171] Decorrelating experience for 32 frames... -[2024-09-29 15:49:28,595][05167] Decorrelating experience for 96 frames... -[2024-09-29 15:49:28,901][00191] Heartbeat connected on RolloutWorker_w1 -[2024-09-29 15:49:28,913][05168] Decorrelating experience for 64 frames... -[2024-09-29 15:49:28,947][05170] Decorrelating experience for 32 frames... -[2024-09-29 15:49:29,070][05173] Decorrelating experience for 0 frames... -[2024-09-29 15:49:29,902][05169] Decorrelating experience for 32 frames... -[2024-09-29 15:49:29,931][00191] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-09-29 15:49:30,406][05172] Decorrelating experience for 64 frames... -[2024-09-29 15:49:31,549][05171] Decorrelating experience for 64 frames... -[2024-09-29 15:49:31,817][05174] Decorrelating experience for 64 frames... -[2024-09-29 15:49:31,977][05168] Decorrelating experience for 96 frames... -[2024-09-29 15:49:32,267][00191] Heartbeat connected on RolloutWorker_w0 -[2024-09-29 15:49:32,451][05170] Decorrelating experience for 64 frames... -[2024-09-29 15:49:32,693][05173] Decorrelating experience for 32 frames... -[2024-09-29 15:49:33,994][05174] Decorrelating experience for 96 frames... -[2024-09-29 15:49:34,126][05169] Decorrelating experience for 64 frames... -[2024-09-29 15:49:34,278][00191] Heartbeat connected on RolloutWorker_w6 -[2024-09-29 15:49:34,348][05172] Decorrelating experience for 96 frames... -[2024-09-29 15:49:34,547][05170] Decorrelating experience for 96 frames... -[2024-09-29 15:49:34,781][00191] Heartbeat connected on RolloutWorker_w5 -[2024-09-29 15:49:34,797][00191] Heartbeat connected on RolloutWorker_w2 -[2024-09-29 15:49:34,931][00191] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 37.0. Samples: 370. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-09-29 15:49:34,936][00191] Avg episode reward: [(0, '2.559')] -[2024-09-29 15:49:36,018][05173] Decorrelating experience for 64 frames... -[2024-09-29 15:49:36,391][05171] Decorrelating experience for 96 frames... -[2024-09-29 15:49:37,010][00191] Heartbeat connected on RolloutWorker_w4 -[2024-09-29 15:49:38,505][05169] Decorrelating experience for 96 frames... -[2024-09-29 15:49:38,599][05153] Signal inference workers to stop experience collection... -[2024-09-29 15:49:38,596][05173] Decorrelating experience for 96 frames... -[2024-09-29 15:49:38,642][05166] InferenceWorker_p0-w0: stopping experience collection -[2024-09-29 15:49:38,695][00191] Heartbeat connected on RolloutWorker_w3 -[2024-09-29 15:49:38,762][00191] Heartbeat connected on RolloutWorker_w7 -[2024-09-29 15:49:39,931][00191] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 206.8. Samples: 3102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-09-29 15:49:39,937][00191] Avg episode reward: [(0, '2.844')] -[2024-09-29 15:49:40,748][05153] Signal inference workers to resume experience collection... -[2024-09-29 15:49:40,753][05166] InferenceWorker_p0-w0: resuming experience collection -[2024-09-29 15:49:44,936][00191] Fps is (10 sec: 2047.0, 60 sec: 1023.7, 300 sec: 1023.7). Total num frames: 20480. Throughput: 0: 191.0. Samples: 3820. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) -[2024-09-29 15:49:44,941][00191] Avg episode reward: [(0, '3.301')] -[2024-09-29 15:49:49,931][00191] Fps is (10 sec: 3686.3, 60 sec: 1474.5, 300 sec: 1474.5). Total num frames: 36864. Throughput: 0: 345.0. Samples: 8624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:49:49,936][00191] Avg episode reward: [(0, '3.664')] -[2024-09-29 15:49:50,751][05166] Updated weights for policy 0, policy_version 10 (0.0048) -[2024-09-29 15:49:54,931][00191] Fps is (10 sec: 4097.9, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 485.9. Samples: 14578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-29 15:49:54,938][00191] Avg episode reward: [(0, '4.352')] -[2024-09-29 15:49:59,426][05166] Updated weights for policy 0, policy_version 20 (0.0024) -[2024-09-29 15:49:59,931][00191] Fps is (10 sec: 4505.7, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 519.1. Samples: 18170. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-09-29 15:49:59,938][00191] Avg episode reward: [(0, '4.723')] -[2024-09-29 15:50:04,931][00191] Fps is (10 sec: 3686.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 600.0. Samples: 24002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:50:04,936][00191] Avg episode reward: [(0, '4.487')] -[2024-09-29 15:50:09,931][00191] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 114688. Throughput: 0: 644.6. Samples: 29008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:50:09,934][00191] Avg episode reward: [(0, '4.105')] -[2024-09-29 15:50:09,942][05153] Saving new best policy, reward=4.105! -[2024-09-29 15:50:10,894][05166] Updated weights for policy 0, policy_version 30 (0.0030) -[2024-09-29 15:50:14,931][00191] Fps is (10 sec: 4096.0, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 716.2. Samples: 32230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:50:14,937][00191] Avg episode reward: [(0, '4.482')] -[2024-09-29 15:50:14,943][05153] Saving new best policy, reward=4.482! -[2024-09-29 15:50:19,933][00191] Fps is (10 sec: 4095.4, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 155648. Throughput: 0: 859.6. Samples: 39054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:50:19,936][00191] Avg episode reward: [(0, '4.632')] -[2024-09-29 15:50:19,998][05153] Saving new best policy, reward=4.632! -[2024-09-29 15:50:21,629][05166] Updated weights for policy 0, policy_version 40 (0.0021) -[2024-09-29 15:50:24,931][00191] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 893.6. Samples: 43314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:50:24,938][00191] Avg episode reward: [(0, '4.450')] -[2024-09-29 15:50:29,939][00191] Fps is (10 sec: 4093.5, 60 sec: 3276.4, 300 sec: 3024.4). Total num frames: 196608. Throughput: 0: 953.3. Samples: 46722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-09-29 15:50:29,941][00191] Avg episode reward: [(0, '4.332')] -[2024-09-29 15:50:31,027][05166] Updated weights for policy 0, policy_version 50 (0.0036) -[2024-09-29 15:50:34,933][00191] Fps is (10 sec: 4914.3, 60 sec: 3686.3, 300 sec: 3159.7). Total num frames: 221184. Throughput: 0: 1005.8. Samples: 53888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-09-29 15:50:34,937][00191] Avg episode reward: [(0, '4.357')] -[2024-09-29 15:50:39,931][00191] Fps is (10 sec: 3689.2, 60 sec: 3891.2, 300 sec: 3113.0). Total num frames: 233472. Throughput: 0: 983.4. Samples: 58830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-09-29 15:50:39,937][00191] Avg episode reward: [(0, '4.365')] -[2024-09-29 15:50:42,629][05166] Updated weights for policy 0, policy_version 60 (0.0028) -[2024-09-29 15:50:44,931][00191] Fps is (10 sec: 3277.4, 60 sec: 3891.5, 300 sec: 3174.4). Total num frames: 253952. Throughput: 0: 955.7. Samples: 61176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:50:44,937][00191] Avg episode reward: [(0, '4.468')] -[2024-09-29 15:50:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 985.4. Samples: 68346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:50:49,936][00191] Avg episode reward: [(0, '4.519')] -[2024-09-29 15:50:51,255][05166] Updated weights for policy 0, policy_version 70 (0.0021) -[2024-09-29 15:50:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 294912. Throughput: 0: 1004.2. Samples: 74196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:50:54,934][00191] Avg episode reward: [(0, '4.502')] -[2024-09-29 15:50:59,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 311296. Throughput: 0: 980.0. Samples: 76330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:50:59,939][00191] Avg episode reward: [(0, '4.368')] -[2024-09-29 15:50:59,947][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth... -[2024-09-29 15:51:02,777][05166] Updated weights for policy 0, policy_version 80 (0.0036) -[2024-09-29 15:51:04,931][00191] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 970.6. Samples: 82730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:51:04,934][00191] Avg episode reward: [(0, '4.287')] -[2024-09-29 15:51:09,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 1025.6. Samples: 89464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:51:09,933][00191] Avg episode reward: [(0, '4.269')] -[2024-09-29 15:51:13,978][05166] Updated weights for policy 0, policy_version 90 (0.0055) -[2024-09-29 15:51:14,933][00191] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3351.2). Total num frames: 368640. Throughput: 0: 995.7. Samples: 91524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:51:14,935][00191] Avg episode reward: [(0, '4.264')] -[2024-09-29 15:51:19,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 954.1. Samples: 96822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:51:19,933][00191] Avg episode reward: [(0, '4.571')] -[2024-09-29 15:51:23,247][05166] Updated weights for policy 0, policy_version 100 (0.0045) -[2024-09-29 15:51:24,932][00191] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 1004.0. Samples: 104010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:51:24,937][00191] Avg episode reward: [(0, '4.714')] -[2024-09-29 15:51:24,956][05153] Saving new best policy, reward=4.714! -[2024-09-29 15:51:29,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.7, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 1015.5. Samples: 106874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-29 15:51:29,936][00191] Avg episode reward: [(0, '4.681')] -[2024-09-29 15:51:34,446][05166] Updated weights for policy 0, policy_version 110 (0.0019) -[2024-09-29 15:51:34,931][00191] Fps is (10 sec: 3686.6, 60 sec: 3823.0, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 954.6. Samples: 111304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:51:34,934][00191] Avg episode reward: [(0, '4.717')] -[2024-09-29 15:51:34,939][05153] Saving new best policy, reward=4.717! -[2024-09-29 15:51:39,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3519.5). Total num frames: 475136. Throughput: 0: 982.1. Samples: 118390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:51:39,939][00191] Avg episode reward: [(0, '4.711')] -[2024-09-29 15:51:44,026][05166] Updated weights for policy 0, policy_version 120 (0.0031) -[2024-09-29 15:51:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 1012.5. Samples: 121892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:51:44,933][00191] Avg episode reward: [(0, '4.703')] -[2024-09-29 15:51:49,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 970.9. Samples: 126420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:51:49,938][00191] Avg episode reward: [(0, '4.723')] -[2024-09-29 15:51:49,948][05153] Saving new best policy, reward=4.723! -[2024-09-29 15:51:54,872][05166] Updated weights for policy 0, policy_version 130 (0.0038) -[2024-09-29 15:51:54,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 532480. Throughput: 0: 958.1. Samples: 132578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:51:54,934][00191] Avg episode reward: [(0, '4.676')] -[2024-09-29 15:51:59,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 992.5. Samples: 136186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:51:59,935][00191] Avg episode reward: [(0, '4.553')] -[2024-09-29 15:52:04,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 999.6. Samples: 141806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:52:04,938][00191] Avg episode reward: [(0, '4.543')] -[2024-09-29 15:52:06,046][05166] Updated weights for policy 0, policy_version 140 (0.0039) -[2024-09-29 15:52:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 955.9. Samples: 147024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 15:52:09,933][00191] Avg episode reward: [(0, '4.675')] -[2024-09-29 15:52:14,931][00191] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 971.2. Samples: 150576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 15:52:14,934][00191] Avg episode reward: [(0, '4.704')] -[2024-09-29 15:52:15,216][05166] Updated weights for policy 0, policy_version 150 (0.0049) -[2024-09-29 15:52:19,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 1016.6. Samples: 157052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:52:19,937][00191] Avg episode reward: [(0, '4.877')] -[2024-09-29 15:52:19,949][05153] Saving new best policy, reward=4.877! -[2024-09-29 15:52:24,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3595.4). Total num frames: 647168. Throughput: 0: 957.4. Samples: 161474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:52:24,938][00191] Avg episode reward: [(0, '4.718')] -[2024-09-29 15:52:26,491][05166] Updated weights for policy 0, policy_version 160 (0.0030) -[2024-09-29 15:52:29,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3608.9). Total num frames: 667648. Throughput: 0: 959.3. Samples: 165060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:52:29,936][00191] Avg episode reward: [(0, '4.504')] -[2024-09-29 15:52:34,936][00191] Fps is (10 sec: 4503.6, 60 sec: 4027.4, 300 sec: 3643.2). Total num frames: 692224. Throughput: 0: 1016.7. Samples: 172178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:52:34,938][00191] Avg episode reward: [(0, '5.020')] -[2024-09-29 15:52:34,940][05153] Saving new best policy, reward=5.020! -[2024-09-29 15:52:35,833][05166] Updated weights for policy 0, policy_version 170 (0.0043) -[2024-09-29 15:52:39,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 981.6. Samples: 176750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:52:39,938][00191] Avg episode reward: [(0, '5.194')] -[2024-09-29 15:52:39,951][05153] Saving new best policy, reward=5.194! -[2024-09-29 15:52:44,931][00191] Fps is (10 sec: 3278.2, 60 sec: 3891.2, 300 sec: 3625.0). Total num frames: 724992. Throughput: 0: 962.9. Samples: 179516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:52:44,933][00191] Avg episode reward: [(0, '4.952')] -[2024-09-29 15:52:46,694][05166] Updated weights for policy 0, policy_version 180 (0.0033) -[2024-09-29 15:52:49,931][00191] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3656.4). Total num frames: 749568. Throughput: 0: 992.0. Samples: 186446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:52:49,934][00191] Avg episode reward: [(0, '4.852')] -[2024-09-29 15:52:54,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3647.4). Total num frames: 765952. Throughput: 0: 999.7. Samples: 192012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:52:54,934][00191] Avg episode reward: [(0, '4.821')] -[2024-09-29 15:52:57,846][05166] Updated weights for policy 0, policy_version 190 (0.0032) -[2024-09-29 15:52:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3657.8). Total num frames: 786432. Throughput: 0: 970.1. Samples: 194232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:52:59,936][00191] Avg episode reward: [(0, '5.111')] -[2024-09-29 15:52:59,945][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... -[2024-09-29 15:53:04,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3667.8). Total num frames: 806912. Throughput: 0: 978.4. Samples: 201080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:53:04,936][00191] Avg episode reward: [(0, '5.469')] -[2024-09-29 15:53:04,938][05153] Saving new best policy, reward=5.469! -[2024-09-29 15:53:06,711][05166] Updated weights for policy 0, policy_version 200 (0.0026) -[2024-09-29 15:53:09,935][00191] Fps is (10 sec: 4094.5, 60 sec: 3959.2, 300 sec: 3677.2). Total num frames: 827392. Throughput: 0: 1022.5. Samples: 207488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:53:09,938][00191] Avg episode reward: [(0, '5.560')] -[2024-09-29 15:53:09,950][05153] Saving new best policy, reward=5.560! -[2024-09-29 15:53:14,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3668.6). Total num frames: 843776. Throughput: 0: 989.0. Samples: 209566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:53:14,934][00191] Avg episode reward: [(0, '5.574')] -[2024-09-29 15:53:14,936][05153] Saving new best policy, reward=5.574! -[2024-09-29 15:53:18,293][05166] Updated weights for policy 0, policy_version 210 (0.0038) -[2024-09-29 15:53:19,931][00191] Fps is (10 sec: 3687.6, 60 sec: 3891.2, 300 sec: 3677.7). Total num frames: 864256. Throughput: 0: 957.9. Samples: 215278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:53:19,934][00191] Avg episode reward: [(0, '5.471')] -[2024-09-29 15:53:24,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3703.5). Total num frames: 888832. Throughput: 0: 1016.9. Samples: 222512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:53:24,934][00191] Avg episode reward: [(0, '5.514')] -[2024-09-29 15:53:28,043][05166] Updated weights for policy 0, policy_version 220 (0.0026) -[2024-09-29 15:53:29,937][00191] Fps is (10 sec: 4093.8, 60 sec: 3959.1, 300 sec: 3694.7). Total num frames: 905216. Throughput: 0: 1013.4. Samples: 225124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:53:29,939][00191] Avg episode reward: [(0, '5.419')] -[2024-09-29 15:53:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3702.8). Total num frames: 925696. Throughput: 0: 970.7. Samples: 230128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:53:34,938][00191] Avg episode reward: [(0, '5.799')] -[2024-09-29 15:53:34,940][05153] Saving new best policy, reward=5.799! -[2024-09-29 15:53:37,987][05166] Updated weights for policy 0, policy_version 230 (0.0040) -[2024-09-29 15:53:39,931][00191] Fps is (10 sec: 4508.1, 60 sec: 4096.0, 300 sec: 3726.6). Total num frames: 950272. Throughput: 0: 1004.6. Samples: 237220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:53:39,933][00191] Avg episode reward: [(0, '5.829')] -[2024-09-29 15:53:39,943][05153] Saving new best policy, reward=5.829! -[2024-09-29 15:53:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3717.9). Total num frames: 966656. Throughput: 0: 1031.2. Samples: 240638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:53:44,934][00191] Avg episode reward: [(0, '5.453')] -[2024-09-29 15:53:49,647][05166] Updated weights for policy 0, policy_version 240 (0.0020) -[2024-09-29 15:53:49,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3709.6). Total num frames: 983040. Throughput: 0: 973.9. Samples: 244904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:53:49,934][00191] Avg episode reward: [(0, '5.692')] -[2024-09-29 15:53:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3731.9). Total num frames: 1007616. Throughput: 0: 977.6. Samples: 251476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:53:54,935][00191] Avg episode reward: [(0, '5.735')] -[2024-09-29 15:53:58,153][05166] Updated weights for policy 0, policy_version 250 (0.0027) -[2024-09-29 15:53:59,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3738.5). Total num frames: 1028096. Throughput: 0: 1011.4. Samples: 255078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 15:53:59,936][00191] Avg episode reward: [(0, '5.589')] -[2024-09-29 15:54:04,938][00191] Fps is (10 sec: 3684.0, 60 sec: 3959.0, 300 sec: 3730.2). Total num frames: 1044480. Throughput: 0: 1000.4. Samples: 260302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:54:04,942][00191] Avg episode reward: [(0, '5.789')] -[2024-09-29 15:54:09,845][05166] Updated weights for policy 0, policy_version 260 (0.0031) -[2024-09-29 15:54:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 961.2. Samples: 265768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:54:09,934][00191] Avg episode reward: [(0, '5.461')] -[2024-09-29 15:54:14,931][00191] Fps is (10 sec: 4098.6, 60 sec: 4027.7, 300 sec: 3742.9). Total num frames: 1085440. Throughput: 0: 981.6. Samples: 269292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:54:14,935][00191] Avg episode reward: [(0, '5.681')] -[2024-09-29 15:54:19,939][00191] Fps is (10 sec: 3683.6, 60 sec: 3959.0, 300 sec: 3734.9). Total num frames: 1101824. Throughput: 0: 1009.9. Samples: 275580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:54:19,943][00191] Avg episode reward: [(0, '5.758')] -[2024-09-29 15:54:20,129][05166] Updated weights for policy 0, policy_version 270 (0.0021) -[2024-09-29 15:54:24,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 954.3. Samples: 280162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:54:24,935][00191] Avg episode reward: [(0, '5.638')] -[2024-09-29 15:54:29,836][05166] Updated weights for policy 0, policy_version 280 (0.0041) -[2024-09-29 15:54:29,931][00191] Fps is (10 sec: 4509.0, 60 sec: 4028.1, 300 sec: 3887.7). Total num frames: 1146880. Throughput: 0: 959.1. Samples: 283796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:54:29,934][00191] Avg episode reward: [(0, '5.973')] -[2024-09-29 15:54:29,944][05153] Saving new best policy, reward=5.973! -[2024-09-29 15:54:34,934][00191] Fps is (10 sec: 4504.4, 60 sec: 4027.6, 300 sec: 3957.1). Total num frames: 1167360. Throughput: 0: 1023.5. Samples: 290964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:54:34,943][00191] Avg episode reward: [(0, '6.115')] -[2024-09-29 15:54:34,947][05153] Saving new best policy, reward=6.115! -[2024-09-29 15:54:39,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1179648. Throughput: 0: 976.6. Samples: 295422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:54:39,937][00191] Avg episode reward: [(0, '6.182')] -[2024-09-29 15:54:39,948][05153] Saving new best policy, reward=6.182! -[2024-09-29 15:54:41,406][05166] Updated weights for policy 0, policy_version 290 (0.0037) -[2024-09-29 15:54:44,931][00191] Fps is (10 sec: 3277.6, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1200128. Throughput: 0: 958.8. Samples: 298226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:54:44,934][00191] Avg episode reward: [(0, '6.120')] -[2024-09-29 15:54:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1224704. Throughput: 0: 998.4. Samples: 305222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:54:49,934][00191] Avg episode reward: [(0, '6.299')] -[2024-09-29 15:54:49,945][05153] Saving new best policy, reward=6.299! -[2024-09-29 15:54:50,229][05166] Updated weights for policy 0, policy_version 300 (0.0032) -[2024-09-29 15:54:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1241088. Throughput: 0: 995.0. Samples: 310544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 15:54:54,936][00191] Avg episode reward: [(0, '6.475')] -[2024-09-29 15:54:54,940][05153] Saving new best policy, reward=6.475! -[2024-09-29 15:54:59,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1257472. Throughput: 0: 964.0. Samples: 312672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:54:59,937][00191] Avg episode reward: [(0, '6.604')] -[2024-09-29 15:54:59,972][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth... -[2024-09-29 15:55:00,104][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth -[2024-09-29 15:55:00,121][05153] Saving new best policy, reward=6.604! -[2024-09-29 15:55:02,014][05166] Updated weights for policy 0, policy_version 310 (0.0040) -[2024-09-29 15:55:04,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3957.2). Total num frames: 1282048. Throughput: 0: 970.3. Samples: 319236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:55:04,934][00191] Avg episode reward: [(0, '6.852')] -[2024-09-29 15:55:04,938][05153] Saving new best policy, reward=6.852! -[2024-09-29 15:55:09,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1298432. Throughput: 0: 1008.4. Samples: 325540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 15:55:09,938][00191] Avg episode reward: [(0, '6.987')] -[2024-09-29 15:55:09,952][05153] Saving new best policy, reward=6.987! -[2024-09-29 15:55:14,931][00191] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1306624. Throughput: 0: 950.6. Samples: 326574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:55:14,934][00191] Avg episode reward: [(0, '7.160')] -[2024-09-29 15:55:14,939][05153] Saving new best policy, reward=7.160! -[2024-09-29 15:55:16,238][05166] Updated weights for policy 0, policy_version 320 (0.0031) -[2024-09-29 15:55:19,933][00191] Fps is (10 sec: 2866.7, 60 sec: 3755.0, 300 sec: 3915.5). Total num frames: 1327104. Throughput: 0: 875.0. Samples: 330338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:55:19,943][00191] Avg episode reward: [(0, '7.535')] -[2024-09-29 15:55:19,952][05153] Saving new best policy, reward=7.535! -[2024-09-29 15:55:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3901.7). Total num frames: 1347584. Throughput: 0: 931.1. Samples: 337320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:55:24,935][00191] Avg episode reward: [(0, '7.648')] -[2024-09-29 15:55:24,939][05153] Saving new best policy, reward=7.648! -[2024-09-29 15:55:25,734][05166] Updated weights for policy 0, policy_version 330 (0.0033) -[2024-09-29 15:55:29,931][00191] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3860.0). Total num frames: 1359872. Throughput: 0: 915.7. Samples: 339434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:55:29,934][00191] Avg episode reward: [(0, '7.365')] -[2024-09-29 15:55:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3901.6). Total num frames: 1384448. Throughput: 0: 871.2. Samples: 344426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:55:34,934][00191] Avg episode reward: [(0, '7.321')] -[2024-09-29 15:55:36,612][05166] Updated weights for policy 0, policy_version 340 (0.0021) -[2024-09-29 15:55:39,931][00191] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 1404928. Throughput: 0: 910.3. Samples: 351506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:55:39,937][00191] Avg episode reward: [(0, '7.568')] -[2024-09-29 15:55:44,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1421312. Throughput: 0: 933.8. Samples: 354694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:55:44,936][00191] Avg episode reward: [(0, '7.919')] -[2024-09-29 15:55:44,943][05153] Saving new best policy, reward=7.919! -[2024-09-29 15:55:48,215][05166] Updated weights for policy 0, policy_version 350 (0.0025) -[2024-09-29 15:55:49,931][00191] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3873.8). Total num frames: 1437696. Throughput: 0: 879.9. Samples: 358830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:55:49,933][00191] Avg episode reward: [(0, '8.108')] -[2024-09-29 15:55:49,947][05153] Saving new best policy, reward=8.108! -[2024-09-29 15:55:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1462272. Throughput: 0: 884.7. Samples: 365350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:55:54,936][00191] Avg episode reward: [(0, '8.057')] -[2024-09-29 15:55:57,189][05166] Updated weights for policy 0, policy_version 360 (0.0031) -[2024-09-29 15:55:59,931][00191] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 1482752. Throughput: 0: 940.2. Samples: 368884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:55:59,935][00191] Avg episode reward: [(0, '8.826')] -[2024-09-29 15:55:59,946][05153] Saving new best policy, reward=8.826! -[2024-09-29 15:56:04,932][00191] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3860.0). Total num frames: 1495040. Throughput: 0: 967.8. Samples: 373888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:56:04,938][00191] Avg episode reward: [(0, '8.433')] -[2024-09-29 15:56:08,956][05166] Updated weights for policy 0, policy_version 370 (0.0022) -[2024-09-29 15:56:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1519616. Throughput: 0: 936.4. Samples: 379456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:56:09,933][00191] Avg episode reward: [(0, '9.402')] -[2024-09-29 15:56:09,944][05153] Saving new best policy, reward=9.402! -[2024-09-29 15:56:14,931][00191] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1540096. Throughput: 0: 967.4. Samples: 382968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 15:56:14,933][00191] Avg episode reward: [(0, '9.397')] -[2024-09-29 15:56:18,908][05166] Updated weights for policy 0, policy_version 380 (0.0033) -[2024-09-29 15:56:19,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 1556480. Throughput: 0: 991.0. Samples: 389022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:56:19,937][00191] Avg episode reward: [(0, '9.600')] -[2024-09-29 15:56:19,949][05153] Saving new best policy, reward=9.600! -[2024-09-29 15:56:24,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1572864. Throughput: 0: 935.0. Samples: 393580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:56:24,933][00191] Avg episode reward: [(0, '10.175')] -[2024-09-29 15:56:24,940][05153] Saving new best policy, reward=10.175! -[2024-09-29 15:56:29,472][05166] Updated weights for policy 0, policy_version 390 (0.0025) -[2024-09-29 15:56:29,931][00191] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1597440. Throughput: 0: 941.1. Samples: 397042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:56:29,936][00191] Avg episode reward: [(0, '9.817')] -[2024-09-29 15:56:34,931][00191] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1617920. Throughput: 0: 998.4. Samples: 403758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:56:34,938][00191] Avg episode reward: [(0, '10.396')] -[2024-09-29 15:56:34,944][05153] Saving new best policy, reward=10.396! -[2024-09-29 15:56:39,934][00191] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3859.9). Total num frames: 1630208. Throughput: 0: 948.7. Samples: 408042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-29 15:56:39,935][00191] Avg episode reward: [(0, '10.262')] -[2024-09-29 15:56:41,263][05166] Updated weights for policy 0, policy_version 400 (0.0028) -[2024-09-29 15:56:44,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1654784. Throughput: 0: 937.4. Samples: 411066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:56:44,938][00191] Avg episode reward: [(0, '10.518')] -[2024-09-29 15:56:44,941][05153] Saving new best policy, reward=10.518! -[2024-09-29 15:56:49,931][00191] Fps is (10 sec: 4506.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1675264. Throughput: 0: 980.2. Samples: 417998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:56:49,938][00191] Avg episode reward: [(0, '10.367')] -[2024-09-29 15:56:50,064][05166] Updated weights for policy 0, policy_version 410 (0.0037) -[2024-09-29 15:56:54,931][00191] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1691648. Throughput: 0: 974.0. Samples: 423286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:56:54,936][00191] Avg episode reward: [(0, '10.617')] -[2024-09-29 15:56:54,939][05153] Saving new best policy, reward=10.617! -[2024-09-29 15:56:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1712128. Throughput: 0: 944.6. Samples: 425476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:56:59,933][00191] Avg episode reward: [(0, '10.446')] -[2024-09-29 15:56:59,940][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth... -[2024-09-29 15:57:00,061][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth -[2024-09-29 15:57:01,540][05166] Updated weights for policy 0, policy_version 420 (0.0028) -[2024-09-29 15:57:04,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1732608. Throughput: 0: 964.4. Samples: 432418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:04,933][00191] Avg episode reward: [(0, '10.889')] -[2024-09-29 15:57:04,936][05153] Saving new best policy, reward=10.889! -[2024-09-29 15:57:09,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1753088. Throughput: 0: 997.7. Samples: 438476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:57:09,936][00191] Avg episode reward: [(0, '11.505')] -[2024-09-29 15:57:09,949][05153] Saving new best policy, reward=11.505! -[2024-09-29 15:57:12,491][05166] Updated weights for policy 0, policy_version 430 (0.0043) -[2024-09-29 15:57:14,933][00191] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 1769472. Throughput: 0: 964.8. Samples: 440458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:57:14,935][00191] Avg episode reward: [(0, '12.250')] -[2024-09-29 15:57:14,939][05153] Saving new best policy, reward=12.250! -[2024-09-29 15:57:19,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1789952. Throughput: 0: 948.7. Samples: 446450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:19,938][00191] Avg episode reward: [(0, '13.159')] -[2024-09-29 15:57:19,951][05153] Saving new best policy, reward=13.159! -[2024-09-29 15:57:21,788][05166] Updated weights for policy 0, policy_version 440 (0.0025) -[2024-09-29 15:57:24,936][00191] Fps is (10 sec: 4504.4, 60 sec: 4027.4, 300 sec: 3887.7). Total num frames: 1814528. Throughput: 0: 1012.0. Samples: 453582. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-09-29 15:57:24,941][00191] Avg episode reward: [(0, '13.509')] -[2024-09-29 15:57:24,949][05153] Saving new best policy, reward=13.509! -[2024-09-29 15:57:29,931][00191] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1826816. Throughput: 0: 991.1. Samples: 455666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:57:29,933][00191] Avg episode reward: [(0, '13.593')] -[2024-09-29 15:57:29,941][05153] Saving new best policy, reward=13.593! -[2024-09-29 15:57:33,288][05166] Updated weights for policy 0, policy_version 450 (0.0032) -[2024-09-29 15:57:34,931][00191] Fps is (10 sec: 3278.3, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 1847296. Throughput: 0: 954.5. Samples: 460950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:57:34,934][00191] Avg episode reward: [(0, '12.714')] -[2024-09-29 15:57:39,931][00191] Fps is (10 sec: 4505.7, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 1871872. Throughput: 0: 998.6. Samples: 468222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:39,937][00191] Avg episode reward: [(0, '12.889')] -[2024-09-29 15:57:42,487][05166] Updated weights for policy 0, policy_version 460 (0.0038) -[2024-09-29 15:57:44,934][00191] Fps is (10 sec: 4094.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1888256. Throughput: 0: 1018.5. Samples: 471312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:44,941][00191] Avg episode reward: [(0, '12.416')] -[2024-09-29 15:57:49,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1908736. Throughput: 0: 958.3. Samples: 475540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:49,934][00191] Avg episode reward: [(0, '12.715')] -[2024-09-29 15:57:53,377][05166] Updated weights for policy 0, policy_version 470 (0.0028) -[2024-09-29 15:57:54,931][00191] Fps is (10 sec: 4097.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1929216. Throughput: 0: 982.1. Samples: 482670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:54,938][00191] Avg episode reward: [(0, '13.395')] -[2024-09-29 15:57:59,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1949696. Throughput: 0: 1017.8. Samples: 486256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:57:59,934][00191] Avg episode reward: [(0, '13.971')] -[2024-09-29 15:58:00,042][05153] Saving new best policy, reward=13.971! -[2024-09-29 15:58:04,357][05166] Updated weights for policy 0, policy_version 480 (0.0034) -[2024-09-29 15:58:04,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1966080. Throughput: 0: 990.7. Samples: 491032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:58:04,935][00191] Avg episode reward: [(0, '14.377')] -[2024-09-29 15:58:04,939][05153] Saving new best policy, reward=14.377! -[2024-09-29 15:58:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1986560. Throughput: 0: 969.8. Samples: 497218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:09,939][00191] Avg episode reward: [(0, '15.001')] -[2024-09-29 15:58:09,983][05153] Saving new best policy, reward=15.001! -[2024-09-29 15:58:13,410][05166] Updated weights for policy 0, policy_version 490 (0.0021) -[2024-09-29 15:58:14,931][00191] Fps is (10 sec: 4505.9, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 2011136. Throughput: 0: 999.5. Samples: 500644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:14,938][00191] Avg episode reward: [(0, '15.733')] -[2024-09-29 15:58:14,943][05153] Saving new best policy, reward=15.733! -[2024-09-29 15:58:19,933][00191] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 2027520. Throughput: 0: 1009.1. Samples: 506360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:19,936][00191] Avg episode reward: [(0, '15.831')] -[2024-09-29 15:58:19,953][05153] Saving new best policy, reward=15.831! -[2024-09-29 15:58:24,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3860.0). Total num frames: 2043904. Throughput: 0: 959.2. Samples: 511388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:24,939][00191] Avg episode reward: [(0, '17.348')] -[2024-09-29 15:58:24,941][05153] Saving new best policy, reward=17.348! -[2024-09-29 15:58:25,221][05166] Updated weights for policy 0, policy_version 500 (0.0027) -[2024-09-29 15:58:29,931][00191] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2068480. Throughput: 0: 967.8. Samples: 514862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:29,934][00191] Avg episode reward: [(0, '18.564')] -[2024-09-29 15:58:29,944][05153] Saving new best policy, reward=18.564! -[2024-09-29 15:58:34,364][05166] Updated weights for policy 0, policy_version 510 (0.0034) -[2024-09-29 15:58:34,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2088960. Throughput: 0: 1025.9. Samples: 521706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:58:34,934][00191] Avg episode reward: [(0, '18.982')] -[2024-09-29 15:58:34,935][05153] Saving new best policy, reward=18.982! -[2024-09-29 15:58:39,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2105344. Throughput: 0: 963.6. Samples: 526030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:39,936][00191] Avg episode reward: [(0, '19.105')] -[2024-09-29 15:58:39,945][05153] Saving new best policy, reward=19.105! -[2024-09-29 15:58:44,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 2125824. Throughput: 0: 958.3. Samples: 529380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:58:44,935][00191] Avg episode reward: [(0, '20.089')] -[2024-09-29 15:58:44,940][05153] Saving new best policy, reward=20.089! -[2024-09-29 15:58:45,182][05166] Updated weights for policy 0, policy_version 520 (0.0030) -[2024-09-29 15:58:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2150400. Throughput: 0: 1005.3. Samples: 536268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:58:49,933][00191] Avg episode reward: [(0, '18.359')] -[2024-09-29 15:58:54,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2162688. Throughput: 0: 978.2. Samples: 541236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:58:54,936][00191] Avg episode reward: [(0, '18.787')] -[2024-09-29 15:58:56,478][05166] Updated weights for policy 0, policy_version 530 (0.0039) -[2024-09-29 15:58:59,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2183168. Throughput: 0: 960.3. Samples: 543858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:58:59,940][00191] Avg episode reward: [(0, '19.080')] -[2024-09-29 15:58:59,960][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000534_2187264.pth... -[2024-09-29 15:59:00,096][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth -[2024-09-29 15:59:04,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3873.8). Total num frames: 2207744. Throughput: 0: 990.6. Samples: 550936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:59:04,939][00191] Avg episode reward: [(0, '19.559')] -[2024-09-29 15:59:05,248][05166] Updated weights for policy 0, policy_version 540 (0.0035) -[2024-09-29 15:59:09,931][00191] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2224128. Throughput: 0: 1005.6. Samples: 556640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:59:09,934][00191] Avg episode reward: [(0, '19.132')] -[2024-09-29 15:59:14,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.1). Total num frames: 2240512. Throughput: 0: 976.6. Samples: 558810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:59:14,934][00191] Avg episode reward: [(0, '19.595')] -[2024-09-29 15:59:16,566][05166] Updated weights for policy 0, policy_version 550 (0.0014) -[2024-09-29 15:59:19,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 2265088. Throughput: 0: 968.7. Samples: 565298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:59:19,936][00191] Avg episode reward: [(0, '21.026')] -[2024-09-29 15:59:19,947][05153] Saving new best policy, reward=21.026! -[2024-09-29 15:59:24,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2285568. Throughput: 0: 1023.9. Samples: 572106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:59:24,934][00191] Avg episode reward: [(0, '19.383')] -[2024-09-29 15:59:26,559][05166] Updated weights for policy 0, policy_version 560 (0.0050) -[2024-09-29 15:59:29,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2301952. Throughput: 0: 996.9. Samples: 574240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:59:29,935][00191] Avg episode reward: [(0, '19.259')] -[2024-09-29 15:59:34,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2326528. Throughput: 0: 973.8. Samples: 580090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:59:34,933][00191] Avg episode reward: [(0, '18.192')] -[2024-09-29 15:59:36,556][05166] Updated weights for policy 0, policy_version 570 (0.0022) -[2024-09-29 15:59:39,931][00191] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2351104. Throughput: 0: 1024.2. Samples: 587326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 15:59:39,934][00191] Avg episode reward: [(0, '18.463')] -[2024-09-29 15:59:44,939][00191] Fps is (10 sec: 3683.6, 60 sec: 3959.0, 300 sec: 3859.9). Total num frames: 2363392. Throughput: 0: 1026.1. Samples: 590042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:59:44,945][00191] Avg episode reward: [(0, '18.205')] -[2024-09-29 15:59:47,740][05166] Updated weights for policy 0, policy_version 580 (0.0026) -[2024-09-29 15:59:49,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2383872. Throughput: 0: 973.3. Samples: 594736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 15:59:49,933][00191] Avg episode reward: [(0, '18.728')] -[2024-09-29 15:59:54,931][00191] Fps is (10 sec: 4509.0, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2408448. Throughput: 0: 1007.5. Samples: 601976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 15:59:54,934][00191] Avg episode reward: [(0, '20.152')] -[2024-09-29 15:59:56,201][05166] Updated weights for policy 0, policy_version 590 (0.0017) -[2024-09-29 15:59:59,933][00191] Fps is (10 sec: 4504.6, 60 sec: 4095.9, 300 sec: 3887.7). Total num frames: 2428928. Throughput: 0: 1040.5. Samples: 605636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-29 15:59:59,936][00191] Avg episode reward: [(0, '20.679')] -[2024-09-29 16:00:04,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2445312. Throughput: 0: 992.9. Samples: 609980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:00:04,934][00191] Avg episode reward: [(0, '20.194')] -[2024-09-29 16:00:07,486][05166] Updated weights for policy 0, policy_version 600 (0.0065) -[2024-09-29 16:00:09,931][00191] Fps is (10 sec: 3687.2, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2465792. Throughput: 0: 992.6. Samples: 616772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:00:09,933][00191] Avg episode reward: [(0, '20.440')] -[2024-09-29 16:00:14,937][00191] Fps is (10 sec: 4503.0, 60 sec: 4163.9, 300 sec: 3943.2). Total num frames: 2490368. Throughput: 0: 1024.6. Samples: 620354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:00:14,942][00191] Avg episode reward: [(0, '20.656')] -[2024-09-29 16:00:17,073][05166] Updated weights for policy 0, policy_version 610 (0.0015) -[2024-09-29 16:00:19,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2502656. Throughput: 0: 1015.6. Samples: 625794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:00:19,941][00191] Avg episode reward: [(0, '21.101')] -[2024-09-29 16:00:19,975][05153] Saving new best policy, reward=21.101! -[2024-09-29 16:00:24,934][00191] Fps is (10 sec: 2868.1, 60 sec: 3891.0, 300 sec: 3929.3). Total num frames: 2519040. Throughput: 0: 943.5. Samples: 629788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:00:24,938][00191] Avg episode reward: [(0, '20.314')] -[2024-09-29 16:00:28,694][05166] Updated weights for policy 0, policy_version 620 (0.0037) -[2024-09-29 16:00:29,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2543616. Throughput: 0: 962.7. Samples: 633358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:00:29,933][00191] Avg episode reward: [(0, '19.434')] -[2024-09-29 16:00:34,931][00191] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2560000. Throughput: 0: 1002.5. Samples: 639848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:00:34,935][00191] Avg episode reward: [(0, '20.418')] -[2024-09-29 16:00:39,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2576384. Throughput: 0: 948.7. Samples: 644666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:00:39,933][00191] Avg episode reward: [(0, '19.475')] -[2024-09-29 16:00:39,959][05166] Updated weights for policy 0, policy_version 630 (0.0039) -[2024-09-29 16:00:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3960.0, 300 sec: 3943.3). Total num frames: 2600960. Throughput: 0: 945.5. Samples: 648182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:00:44,933][00191] Avg episode reward: [(0, '19.752')] -[2024-09-29 16:00:48,490][05166] Updated weights for policy 0, policy_version 640 (0.0053) -[2024-09-29 16:00:49,932][00191] Fps is (10 sec: 4914.9, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2625536. Throughput: 0: 1009.8. Samples: 655422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:00:49,937][00191] Avg episode reward: [(0, '20.897')] -[2024-09-29 16:00:54,933][00191] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3915.5). Total num frames: 2637824. Throughput: 0: 959.0. Samples: 659928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:00:54,935][00191] Avg episode reward: [(0, '20.980')] -[2024-09-29 16:00:59,785][05166] Updated weights for policy 0, policy_version 650 (0.0041) -[2024-09-29 16:00:59,931][00191] Fps is (10 sec: 3686.7, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 2662400. Throughput: 0: 945.9. Samples: 662916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:00:59,938][00191] Avg episode reward: [(0, '20.267')] -[2024-09-29 16:00:59,950][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000650_2662400.pth... -[2024-09-29 16:01:00,074][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth -[2024-09-29 16:01:04,931][00191] Fps is (10 sec: 4506.3, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2682880. Throughput: 0: 980.6. Samples: 669920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:01:04,937][00191] Avg episode reward: [(0, '21.667')] -[2024-09-29 16:01:04,942][05153] Saving new best policy, reward=21.667! -[2024-09-29 16:01:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2699264. Throughput: 0: 1007.2. Samples: 675108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:01:09,934][00191] Avg episode reward: [(0, '22.730')] -[2024-09-29 16:01:09,949][05153] Saving new best policy, reward=22.730! -[2024-09-29 16:01:10,851][05166] Updated weights for policy 0, policy_version 660 (0.0023) -[2024-09-29 16:01:14,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3943.3). Total num frames: 2719744. Throughput: 0: 974.7. Samples: 677220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:01:14,937][00191] Avg episode reward: [(0, '23.875')] -[2024-09-29 16:01:14,939][05153] Saving new best policy, reward=23.875! -[2024-09-29 16:01:19,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2740224. Throughput: 0: 986.9. Samples: 684260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:01:19,938][00191] Avg episode reward: [(0, '24.704')] -[2024-09-29 16:01:19,956][05153] Saving new best policy, reward=24.704! -[2024-09-29 16:01:20,196][05166] Updated weights for policy 0, policy_version 670 (0.0035) -[2024-09-29 16:01:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3943.3). Total num frames: 2760704. Throughput: 0: 1015.2. Samples: 690348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:01:24,937][00191] Avg episode reward: [(0, '24.417')] -[2024-09-29 16:01:29,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2772992. Throughput: 0: 982.0. Samples: 692372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:01:29,934][00191] Avg episode reward: [(0, '24.873')] -[2024-09-29 16:01:29,945][05153] Saving new best policy, reward=24.873! -[2024-09-29 16:01:32,042][05166] Updated weights for policy 0, policy_version 680 (0.0024) -[2024-09-29 16:01:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2797568. Throughput: 0: 947.6. Samples: 698064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:01:34,934][00191] Avg episode reward: [(0, '21.637')] -[2024-09-29 16:01:39,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2818048. Throughput: 0: 1009.4. Samples: 705348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:01:39,936][00191] Avg episode reward: [(0, '20.137')] -[2024-09-29 16:01:41,455][05166] Updated weights for policy 0, policy_version 690 (0.0024) -[2024-09-29 16:01:44,931][00191] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2834432. Throughput: 0: 991.3. Samples: 707524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:01:44,934][00191] Avg episode reward: [(0, '19.924')] -[2024-09-29 16:01:49,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 2854912. Throughput: 0: 956.0. Samples: 712938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:01:49,938][00191] Avg episode reward: [(0, '19.015')] -[2024-09-29 16:01:51,811][05166] Updated weights for policy 0, policy_version 700 (0.0027) -[2024-09-29 16:01:54,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3957.1). Total num frames: 2879488. Throughput: 0: 998.7. Samples: 720052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:01:54,938][00191] Avg episode reward: [(0, '18.696')] -[2024-09-29 16:01:59,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2895872. Throughput: 0: 1021.3. Samples: 723178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-29 16:01:59,938][00191] Avg episode reward: [(0, '18.821')] -[2024-09-29 16:02:03,008][05166] Updated weights for policy 0, policy_version 710 (0.0015) -[2024-09-29 16:02:04,931][00191] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2916352. Throughput: 0: 959.7. Samples: 727448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 16:02:04,933][00191] Avg episode reward: [(0, '19.998')] -[2024-09-29 16:02:09,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2936832. Throughput: 0: 980.2. Samples: 734458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:02:09,934][00191] Avg episode reward: [(0, '20.637')] -[2024-09-29 16:02:11,968][05166] Updated weights for policy 0, policy_version 720 (0.0030) -[2024-09-29 16:02:14,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2957312. Throughput: 0: 1009.4. Samples: 737796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:02:14,935][00191] Avg episode reward: [(0, '22.813')] -[2024-09-29 16:02:19,935][00191] Fps is (10 sec: 3275.7, 60 sec: 3822.7, 300 sec: 3915.5). Total num frames: 2969600. Throughput: 0: 987.0. Samples: 742482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 16:02:19,938][00191] Avg episode reward: [(0, '22.809')] -[2024-09-29 16:02:23,534][05166] Updated weights for policy 0, policy_version 730 (0.0039) -[2024-09-29 16:02:24,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2994176. Throughput: 0: 961.0. Samples: 748594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:02:24,935][00191] Avg episode reward: [(0, '23.452')] -[2024-09-29 16:02:29,931][00191] Fps is (10 sec: 4916.8, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3018752. Throughput: 0: 991.3. Samples: 752134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:02:29,934][00191] Avg episode reward: [(0, '21.245')] -[2024-09-29 16:02:33,493][05166] Updated weights for policy 0, policy_version 740 (0.0045) -[2024-09-29 16:02:34,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3035136. Throughput: 0: 1002.0. Samples: 758030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:02:34,939][00191] Avg episode reward: [(0, '20.993')] -[2024-09-29 16:02:39,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3051520. Throughput: 0: 961.6. Samples: 763322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:02:39,937][00191] Avg episode reward: [(0, '19.853')] -[2024-09-29 16:02:43,513][05166] Updated weights for policy 0, policy_version 750 (0.0033) -[2024-09-29 16:02:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3076096. Throughput: 0: 970.5. Samples: 766852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:02:44,937][00191] Avg episode reward: [(0, '21.536')] -[2024-09-29 16:02:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3096576. Throughput: 0: 1026.0. Samples: 773620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:02:49,936][00191] Avg episode reward: [(0, '22.186')] -[2024-09-29 16:02:54,905][05166] Updated weights for policy 0, policy_version 760 (0.0018) -[2024-09-29 16:02:54,931][00191] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3112960. Throughput: 0: 964.0. Samples: 777838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:02:54,940][00191] Avg episode reward: [(0, '23.164')] -[2024-09-29 16:02:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3133440. Throughput: 0: 967.6. Samples: 781338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:02:59,934][00191] Avg episode reward: [(0, '22.492')] -[2024-09-29 16:02:59,974][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth... -[2024-09-29 16:03:00,096][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000534_2187264.pth -[2024-09-29 16:03:03,472][05166] Updated weights for policy 0, policy_version 770 (0.0021) -[2024-09-29 16:03:04,933][00191] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3971.0). Total num frames: 3158016. Throughput: 0: 1021.6. Samples: 788452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:03:04,935][00191] Avg episode reward: [(0, '23.943')] -[2024-09-29 16:03:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3170304. Throughput: 0: 994.7. Samples: 793354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:03:09,940][00191] Avg episode reward: [(0, '22.023')] -[2024-09-29 16:03:14,764][05166] Updated weights for policy 0, policy_version 780 (0.0045) -[2024-09-29 16:03:14,931][00191] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3194880. Throughput: 0: 972.9. Samples: 795914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:03:14,933][00191] Avg episode reward: [(0, '20.818')] -[2024-09-29 16:03:19,931][00191] Fps is (10 sec: 4915.3, 60 sec: 4164.5, 300 sec: 3984.9). Total num frames: 3219456. Throughput: 0: 1003.2. Samples: 803172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 16:03:19,933][00191] Avg episode reward: [(0, '21.393')] -[2024-09-29 16:03:24,773][05166] Updated weights for policy 0, policy_version 790 (0.0025) -[2024-09-29 16:03:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3235840. Throughput: 0: 1012.2. Samples: 808870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:03:24,935][00191] Avg episode reward: [(0, '21.795')] -[2024-09-29 16:03:29,931][00191] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3252224. Throughput: 0: 981.3. Samples: 811010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 16:03:29,938][00191] Avg episode reward: [(0, '22.471')] -[2024-09-29 16:03:34,783][05166] Updated weights for policy 0, policy_version 800 (0.0035) -[2024-09-29 16:03:34,931][00191] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3276800. Throughput: 0: 978.3. Samples: 817642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:03:34,939][00191] Avg episode reward: [(0, '21.974')] -[2024-09-29 16:03:39,933][00191] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 3297280. Throughput: 0: 1037.9. Samples: 824544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:03:39,935][00191] Avg episode reward: [(0, '22.485')] -[2024-09-29 16:03:44,931][00191] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3309568. Throughput: 0: 1006.9. Samples: 826650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:03:44,934][00191] Avg episode reward: [(0, '22.544')] -[2024-09-29 16:03:46,222][05166] Updated weights for policy 0, policy_version 810 (0.0049) -[2024-09-29 16:03:49,931][00191] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3334144. Throughput: 0: 969.7. Samples: 832088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:03:49,934][00191] Avg episode reward: [(0, '23.169')] -[2024-09-29 16:03:54,922][05166] Updated weights for policy 0, policy_version 820 (0.0043) -[2024-09-29 16:03:54,931][00191] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3358720. Throughput: 0: 1018.5. Samples: 839186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:03:54,937][00191] Avg episode reward: [(0, '22.466')] -[2024-09-29 16:03:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3371008. Throughput: 0: 1027.9. Samples: 842170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:03:59,937][00191] Avg episode reward: [(0, '23.727')] -[2024-09-29 16:04:04,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 3391488. Throughput: 0: 964.8. Samples: 846586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:04:04,940][00191] Avg episode reward: [(0, '23.965')] -[2024-09-29 16:04:06,303][05166] Updated weights for policy 0, policy_version 830 (0.0048) -[2024-09-29 16:04:09,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3416064. Throughput: 0: 996.2. Samples: 853698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:04:09,934][00191] Avg episode reward: [(0, '25.442')] -[2024-09-29 16:04:09,945][05153] Saving new best policy, reward=25.442! -[2024-09-29 16:04:14,933][00191] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3432448. Throughput: 0: 1027.3. Samples: 857238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:04:14,940][00191] Avg episode reward: [(0, '25.055')] -[2024-09-29 16:04:16,566][05166] Updated weights for policy 0, policy_version 840 (0.0026) -[2024-09-29 16:04:19,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3448832. Throughput: 0: 980.5. Samples: 861766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:04:19,938][00191] Avg episode reward: [(0, '24.460')] -[2024-09-29 16:04:24,931][00191] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3473408. Throughput: 0: 961.1. Samples: 867792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:04:24,938][00191] Avg episode reward: [(0, '21.985')] -[2024-09-29 16:04:26,632][05166] Updated weights for policy 0, policy_version 850 (0.0032) -[2024-09-29 16:04:29,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3493888. Throughput: 0: 992.6. Samples: 871318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:04:29,935][00191] Avg episode reward: [(0, '20.985')] -[2024-09-29 16:04:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3510272. Throughput: 0: 999.3. Samples: 877056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:04:34,935][00191] Avg episode reward: [(0, '19.961')] -[2024-09-29 16:04:37,982][05166] Updated weights for policy 0, policy_version 860 (0.0025) -[2024-09-29 16:04:39,931][00191] Fps is (10 sec: 3686.3, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 3530752. Throughput: 0: 958.9. Samples: 882336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:04:39,938][00191] Avg episode reward: [(0, '19.138')] -[2024-09-29 16:04:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3551232. Throughput: 0: 969.7. Samples: 885806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:04:44,934][00191] Avg episode reward: [(0, '19.395')] -[2024-09-29 16:04:46,889][05166] Updated weights for policy 0, policy_version 870 (0.0032) -[2024-09-29 16:04:49,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3571712. Throughput: 0: 1018.7. Samples: 892428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 16:04:49,935][00191] Avg episode reward: [(0, '19.286')] -[2024-09-29 16:04:54,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3584000. Throughput: 0: 953.9. Samples: 896624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:04:54,933][00191] Avg episode reward: [(0, '19.727')] -[2024-09-29 16:04:58,301][05166] Updated weights for policy 0, policy_version 880 (0.0046) -[2024-09-29 16:04:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3608576. Throughput: 0: 954.2. Samples: 900174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:04:59,934][00191] Avg episode reward: [(0, '21.502')] -[2024-09-29 16:04:59,947][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000881_3608576.pth... -[2024-09-29 16:05:00,096][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000650_2662400.pth -[2024-09-29 16:05:04,931][00191] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3633152. Throughput: 0: 1005.3. Samples: 907006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:05:04,937][00191] Avg episode reward: [(0, '21.333')] -[2024-09-29 16:05:09,332][05166] Updated weights for policy 0, policy_version 890 (0.0030) -[2024-09-29 16:05:09,934][00191] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3915.5). Total num frames: 3645440. Throughput: 0: 977.1. Samples: 911762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:05:09,940][00191] Avg episode reward: [(0, '22.647')] -[2024-09-29 16:05:14,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 3665920. Throughput: 0: 957.5. Samples: 914404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 16:05:14,936][00191] Avg episode reward: [(0, '24.271')] -[2024-09-29 16:05:18,489][05166] Updated weights for policy 0, policy_version 900 (0.0033) -[2024-09-29 16:05:19,935][00191] Fps is (10 sec: 4505.1, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 3690496. Throughput: 0: 988.6. Samples: 921546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:05:19,938][00191] Avg episode reward: [(0, '24.812')] -[2024-09-29 16:05:24,933][00191] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3943.2). Total num frames: 3706880. Throughput: 0: 997.6. Samples: 927228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:05:24,943][00191] Avg episode reward: [(0, '24.405')] -[2024-09-29 16:05:29,931][00191] Fps is (10 sec: 3278.0, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3723264. Throughput: 0: 968.0. Samples: 929366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-29 16:05:29,940][00191] Avg episode reward: [(0, '23.616')] -[2024-09-29 16:05:30,181][05166] Updated weights for policy 0, policy_version 910 (0.0043) -[2024-09-29 16:05:34,931][00191] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3747840. Throughput: 0: 967.6. Samples: 935970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 16:05:34,937][00191] Avg episode reward: [(0, '24.092')] -[2024-09-29 16:05:38,814][05166] Updated weights for policy 0, policy_version 920 (0.0013) -[2024-09-29 16:05:39,938][00191] Fps is (10 sec: 4502.6, 60 sec: 3959.1, 300 sec: 3957.1). Total num frames: 3768320. Throughput: 0: 1026.1. Samples: 942806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 16:05:39,942][00191] Avg episode reward: [(0, '22.505')] -[2024-09-29 16:05:44,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3784704. Throughput: 0: 994.2. Samples: 944914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:05:44,937][00191] Avg episode reward: [(0, '21.239')] -[2024-09-29 16:05:49,695][05166] Updated weights for policy 0, policy_version 930 (0.0022) -[2024-09-29 16:05:49,931][00191] Fps is (10 sec: 4098.7, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 3809280. Throughput: 0: 971.0. Samples: 950700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:05:49,934][00191] Avg episode reward: [(0, '23.528')] -[2024-09-29 16:05:54,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3829760. Throughput: 0: 1021.3. Samples: 957720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-29 16:05:54,934][00191] Avg episode reward: [(0, '22.984')] -[2024-09-29 16:05:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3846144. Throughput: 0: 1021.0. Samples: 960350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:05:59,933][00191] Avg episode reward: [(0, '21.638')] -[2024-09-29 16:06:00,784][05166] Updated weights for policy 0, policy_version 940 (0.0035) -[2024-09-29 16:06:04,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3866624. Throughput: 0: 967.6. Samples: 965086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:06:04,936][00191] Avg episode reward: [(0, '23.247')] -[2024-09-29 16:06:09,827][05166] Updated weights for policy 0, policy_version 950 (0.0021) -[2024-09-29 16:06:09,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 3971.0). Total num frames: 3891200. Throughput: 0: 1001.0. Samples: 972270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:06:09,934][00191] Avg episode reward: [(0, '23.005')] -[2024-09-29 16:06:14,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3907584. Throughput: 0: 1033.2. Samples: 975860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-29 16:06:14,934][00191] Avg episode reward: [(0, '23.563')] -[2024-09-29 16:06:19,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3943.3). Total num frames: 3923968. Throughput: 0: 985.6. Samples: 980324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-29 16:06:19,933][00191] Avg episode reward: [(0, '23.896')] -[2024-09-29 16:06:21,159][05166] Updated weights for policy 0, policy_version 960 (0.0040) -[2024-09-29 16:06:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 3948544. Throughput: 0: 980.8. Samples: 986934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-09-29 16:06:24,934][00191] Avg episode reward: [(0, '26.992')] -[2024-09-29 16:06:24,939][05153] Saving new best policy, reward=26.992! -[2024-09-29 16:06:29,931][00191] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3969024. Throughput: 0: 1012.5. Samples: 990476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-29 16:06:29,937][00191] Avg episode reward: [(0, '25.832')] -[2024-09-29 16:06:30,212][05166] Updated weights for policy 0, policy_version 970 (0.0030) -[2024-09-29 16:06:34,933][00191] Fps is (10 sec: 3685.6, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 3985408. Throughput: 0: 998.5. Samples: 995634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-29 16:06:34,936][00191] Avg episode reward: [(0, '25.356')] -[2024-09-29 16:06:39,144][05153] Stopping Batcher_0... -[2024-09-29 16:06:39,146][05153] Loop batcher_evt_loop terminating... -[2024-09-29 16:06:39,147][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-29 16:06:39,155][00191] Component Batcher_0 stopped! -[2024-09-29 16:06:39,227][05166] Weights refcount: 2 0 -[2024-09-29 16:06:39,242][05166] Stopping InferenceWorker_p0-w0... -[2024-09-29 16:06:39,243][05166] Loop inference_proc0-0_evt_loop terminating... -[2024-09-29 16:06:39,247][00191] Component InferenceWorker_p0-w0 stopped! -[2024-09-29 16:06:39,296][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth -[2024-09-29 16:06:39,312][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-29 16:06:39,491][05153] Stopping LearnerWorker_p0... -[2024-09-29 16:06:39,498][05153] Loop learner_proc0_evt_loop terminating... -[2024-09-29 16:06:39,494][00191] Component LearnerWorker_p0 stopped! -[2024-09-29 16:06:39,544][00191] Component RolloutWorker_w1 stopped! -[2024-09-29 16:06:39,550][05167] Stopping RolloutWorker_w1... -[2024-09-29 16:06:39,551][05167] Loop rollout_proc1_evt_loop terminating... -[2024-09-29 16:06:39,576][00191] Component RolloutWorker_w5 stopped! -[2024-09-29 16:06:39,581][05172] Stopping RolloutWorker_w5... -[2024-09-29 16:06:39,582][05172] Loop rollout_proc5_evt_loop terminating... -[2024-09-29 16:06:39,588][00191] Component RolloutWorker_w3 stopped! -[2024-09-29 16:06:39,592][05169] Stopping RolloutWorker_w3... -[2024-09-29 16:06:39,593][05169] Loop rollout_proc3_evt_loop terminating... -[2024-09-29 16:06:39,602][00191] Component RolloutWorker_w7 stopped! -[2024-09-29 16:06:39,606][05173] Stopping RolloutWorker_w7... -[2024-09-29 16:06:39,606][05173] Loop rollout_proc7_evt_loop terminating... -[2024-09-29 16:06:39,729][05170] Stopping RolloutWorker_w2... -[2024-09-29 16:06:39,728][00191] Component RolloutWorker_w2 stopped! -[2024-09-29 16:06:39,737][05170] Loop rollout_proc2_evt_loop terminating... -[2024-09-29 16:06:39,739][05171] Stopping RolloutWorker_w4... -[2024-09-29 16:06:39,738][00191] Component RolloutWorker_w4 stopped! -[2024-09-29 16:06:39,746][05171] Loop rollout_proc4_evt_loop terminating... -[2024-09-29 16:06:39,763][00191] Component RolloutWorker_w0 stopped! -[2024-09-29 16:06:39,771][00191] Component RolloutWorker_w6 stopped! -[2024-09-29 16:06:39,771][05174] Stopping RolloutWorker_w6... -[2024-09-29 16:06:39,763][05168] Stopping RolloutWorker_w0... -[2024-09-29 16:06:39,772][00191] Waiting for process learner_proc0 to stop... -[2024-09-29 16:06:39,781][05168] Loop rollout_proc0_evt_loop terminating... -[2024-09-29 16:06:39,773][05174] Loop rollout_proc6_evt_loop terminating... -[2024-09-29 16:06:40,971][00191] Waiting for process inference_proc0-0 to join... -[2024-09-29 16:06:40,973][00191] Waiting for process rollout_proc0 to join... -[2024-09-29 16:06:43,003][00191] Waiting for process rollout_proc1 to join... -[2024-09-29 16:06:43,004][00191] Waiting for process rollout_proc2 to join... -[2024-09-29 16:06:43,011][00191] Waiting for process rollout_proc3 to join... -[2024-09-29 16:06:43,012][00191] Waiting for process rollout_proc4 to join... -[2024-09-29 16:06:43,015][00191] Waiting for process rollout_proc5 to join... -[2024-09-29 16:06:43,016][00191] Waiting for process rollout_proc6 to join... -[2024-09-29 16:06:43,018][00191] Waiting for process rollout_proc7 to join... -[2024-09-29 16:06:43,019][00191] Batcher 0 profile tree view: -batching: 28.2059, releasing_batches: 0.0396 -[2024-09-29 16:06:43,021][00191] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0024 - wait_policy_total: 388.7123 -update_model: 9.1986 - weight_update: 0.0034 -one_step: 0.0052 - handle_policy_step: 592.6184 - deserialize: 14.7357, stack: 3.1584, obs_to_device_normalize: 121.2650, forward: 313.3477, send_messages: 27.6779 - prepare_outputs: 82.7950 - to_cpu: 47.5294 -[2024-09-29 16:06:43,022][00191] Learner 0 profile tree view: -misc: 0.0064, prepare_batch: 13.4630 -train: 74.9971 - epoch_init: 0.0085, minibatch_init: 0.0064, losses_postprocess: 0.6486, kl_divergence: 0.6932, after_optimizer: 33.6219 - calculate_losses: 27.1642 - losses_init: 0.0039, forward_head: 1.3007, bptt_initial: 18.4265, tail: 1.0064, advantages_returns: 0.2818, losses: 3.8878 - bptt: 1.9420 - bptt_forward_core: 1.8378 - update: 12.1432 - clip: 0.9211 -[2024-09-29 16:06:43,024][00191] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.3493, enqueue_policy_requests: 88.6628, env_step: 811.2053, overhead: 12.0389, complete_rollouts: 7.0654 -save_policy_outputs: 19.0850 - split_output_tensors: 7.6624 -[2024-09-29 16:06:43,025][00191] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.2917, enqueue_policy_requests: 93.0825, env_step: 801.9541, overhead: 12.2341, complete_rollouts: 6.6274 -save_policy_outputs: 19.1569 - split_output_tensors: 7.5929 -[2024-09-29 16:06:43,026][00191] Loop Runner_EvtLoop terminating... -[2024-09-29 16:06:43,028][00191] Runner profile tree view: -main_loop: 1057.6423 -[2024-09-29 16:06:43,029][00191] Collected {0: 4005888}, FPS: 3787.6 -[2024-09-29 16:06:48,817][00191] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-09-29 16:06:48,819][00191] Overriding arg 'num_workers' with value 1 passed from command line -[2024-09-29 16:06:48,821][00191] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-09-29 16:06:48,824][00191] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-09-29 16:06:48,825][00191] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-09-29 16:06:48,827][00191] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-09-29 16:06:48,829][00191] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-09-29 16:06:48,830][00191] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-09-29 16:06:48,831][00191] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-09-29 16:06:48,832][00191] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-09-29 16:06:48,834][00191] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-09-29 16:06:48,835][00191] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-09-29 16:06:48,836][00191] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-09-29 16:06:48,837][00191] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-09-29 16:06:48,838][00191] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-09-29 16:06:48,872][00191] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-29 16:06:48,875][00191] RunningMeanStd input shape: (3, 72, 128) -[2024-09-29 16:06:48,877][00191] RunningMeanStd input shape: (1,) -[2024-09-29 16:06:48,894][00191] ConvEncoder: input_channels=3 -[2024-09-29 16:06:48,996][00191] Conv encoder output size: 512 -[2024-09-29 16:06:48,997][00191] Policy head output size: 512 -[2024-09-29 16:06:49,269][00191] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-29 16:06:50,090][00191] Num frames 100... -[2024-09-29 16:06:50,213][00191] Num frames 200... -[2024-09-29 16:06:50,334][00191] Num frames 300... -[2024-09-29 16:06:50,457][00191] Num frames 400... -[2024-09-29 16:06:50,577][00191] Avg episode rewards: #0: 8.480, true rewards: #0: 4.480 -[2024-09-29 16:06:50,579][00191] Avg episode reward: 8.480, avg true_objective: 4.480 -[2024-09-29 16:06:50,644][00191] Num frames 500... -[2024-09-29 16:06:50,771][00191] Num frames 600... -[2024-09-29 16:06:50,890][00191] Num frames 700... -[2024-09-29 16:06:51,011][00191] Num frames 800... -[2024-09-29 16:06:51,133][00191] Num frames 900... -[2024-09-29 16:06:51,255][00191] Num frames 1000... -[2024-09-29 16:06:51,381][00191] Num frames 1100... -[2024-09-29 16:06:51,506][00191] Num frames 1200... -[2024-09-29 16:06:51,642][00191] Num frames 1300... -[2024-09-29 16:06:51,768][00191] Num frames 1400... -[2024-09-29 16:06:51,888][00191] Num frames 1500... -[2024-09-29 16:06:52,007][00191] Num frames 1600... -[2024-09-29 16:06:52,139][00191] Num frames 1700... -[2024-09-29 16:06:52,282][00191] Num frames 1800... -[2024-09-29 16:06:52,405][00191] Num frames 1900... -[2024-09-29 16:06:52,558][00191] Avg episode rewards: #0: 22.395, true rewards: #0: 9.895 -[2024-09-29 16:06:52,559][00191] Avg episode reward: 22.395, avg true_objective: 9.895 -[2024-09-29 16:06:52,593][00191] Num frames 2000... -[2024-09-29 16:06:52,714][00191] Num frames 2100... -[2024-09-29 16:06:52,839][00191] Num frames 2200... -[2024-09-29 16:06:52,963][00191] Num frames 2300... -[2024-09-29 16:06:53,083][00191] Num frames 2400... -[2024-09-29 16:06:53,264][00191] Avg episode rewards: #0: 17.970, true rewards: #0: 8.303 -[2024-09-29 16:06:53,265][00191] Avg episode reward: 17.970, avg true_objective: 8.303 -[2024-09-29 16:06:53,279][00191] Num frames 2500... -[2024-09-29 16:06:53,397][00191] Num frames 2600... -[2024-09-29 16:06:53,517][00191] Num frames 2700... -[2024-09-29 16:06:53,653][00191] Num frames 2800... -[2024-09-29 16:06:53,774][00191] Num frames 2900... -[2024-09-29 16:06:53,894][00191] Num frames 3000... -[2024-09-29 16:06:54,013][00191] Num frames 3100... -[2024-09-29 16:06:54,135][00191] Num frames 3200... -[2024-09-29 16:06:54,254][00191] Num frames 3300... -[2024-09-29 16:06:54,381][00191] Num frames 3400... -[2024-09-29 16:06:54,505][00191] Num frames 3500... -[2024-09-29 16:06:54,671][00191] Avg episode rewards: #0: 19.448, true rewards: #0: 8.947 -[2024-09-29 16:06:54,673][00191] Avg episode reward: 19.448, avg true_objective: 8.947 -[2024-09-29 16:06:54,703][00191] Num frames 3600... -[2024-09-29 16:06:54,821][00191] Num frames 3700... -[2024-09-29 16:06:54,939][00191] Num frames 3800... -[2024-09-29 16:06:55,056][00191] Num frames 3900... -[2024-09-29 16:06:55,178][00191] Num frames 4000... -[2024-09-29 16:06:55,307][00191] Avg episode rewards: #0: 16.918, true rewards: #0: 8.118 -[2024-09-29 16:06:55,308][00191] Avg episode reward: 16.918, avg true_objective: 8.118 -[2024-09-29 16:06:55,359][00191] Num frames 4100... -[2024-09-29 16:06:55,480][00191] Num frames 4200... -[2024-09-29 16:06:55,613][00191] Num frames 4300... -[2024-09-29 16:06:55,734][00191] Num frames 4400... -[2024-09-29 16:06:55,878][00191] Avg episode rewards: #0: 15.125, true rewards: #0: 7.458 -[2024-09-29 16:06:55,880][00191] Avg episode reward: 15.125, avg true_objective: 7.458 -[2024-09-29 16:06:55,912][00191] Num frames 4500... -[2024-09-29 16:06:56,030][00191] Num frames 4600... -[2024-09-29 16:06:56,151][00191] Num frames 4700... -[2024-09-29 16:06:56,269][00191] Num frames 4800... -[2024-09-29 16:06:56,391][00191] Num frames 4900... -[2024-09-29 16:06:56,512][00191] Num frames 5000... -[2024-09-29 16:06:56,591][00191] Avg episode rewards: #0: 14.170, true rewards: #0: 7.170 -[2024-09-29 16:06:56,593][00191] Avg episode reward: 14.170, avg true_objective: 7.170 -[2024-09-29 16:06:56,699][00191] Num frames 5100... -[2024-09-29 16:06:56,816][00191] Num frames 5200... -[2024-09-29 16:06:56,938][00191] Num frames 5300... -[2024-09-29 16:06:57,056][00191] Num frames 5400... -[2024-09-29 16:06:57,175][00191] Num frames 5500... -[2024-09-29 16:06:57,332][00191] Num frames 5600... -[2024-09-29 16:06:57,502][00191] Num frames 5700... -[2024-09-29 16:06:57,689][00191] Num frames 5800... -[2024-09-29 16:06:57,837][00191] Avg episode rewards: #0: 14.189, true rewards: #0: 7.314 -[2024-09-29 16:06:57,841][00191] Avg episode reward: 14.189, avg true_objective: 7.314 -[2024-09-29 16:06:57,925][00191] Num frames 5900... -[2024-09-29 16:06:58,095][00191] Num frames 6000... -[2024-09-29 16:06:58,265][00191] Num frames 6100... -[2024-09-29 16:06:58,433][00191] Num frames 6200... -[2024-09-29 16:06:58,616][00191] Num frames 6300... -[2024-09-29 16:06:58,795][00191] Num frames 6400... -[2024-09-29 16:06:58,983][00191] Num frames 6500... -[2024-09-29 16:06:59,169][00191] Num frames 6600... -[2024-09-29 16:06:59,341][00191] Num frames 6700... -[2024-09-29 16:06:59,512][00191] Num frames 6800... -[2024-09-29 16:06:59,741][00191] Avg episode rewards: #0: 15.107, true rewards: #0: 7.662 -[2024-09-29 16:06:59,745][00191] Avg episode reward: 15.107, avg true_objective: 7.662 -[2024-09-29 16:06:59,752][00191] Num frames 6900... -[2024-09-29 16:06:59,885][00191] Num frames 7000... -[2024-09-29 16:07:00,006][00191] Num frames 7100... -[2024-09-29 16:07:00,129][00191] Num frames 7200... -[2024-09-29 16:07:00,255][00191] Num frames 7300... -[2024-09-29 16:07:00,383][00191] Num frames 7400... -[2024-09-29 16:07:00,509][00191] Num frames 7500... -[2024-09-29 16:07:00,655][00191] Num frames 7600... -[2024-09-29 16:07:00,784][00191] Num frames 7700... -[2024-09-29 16:07:00,914][00191] Num frames 7800... -[2024-09-29 16:07:01,041][00191] Num frames 7900... -[2024-09-29 16:07:01,161][00191] Num frames 8000... -[2024-09-29 16:07:01,286][00191] Num frames 8100... -[2024-09-29 16:07:01,407][00191] Num frames 8200... -[2024-09-29 16:07:01,530][00191] Num frames 8300... -[2024-09-29 16:07:01,661][00191] Num frames 8400... -[2024-09-29 16:07:01,790][00191] Num frames 8500... -[2024-09-29 16:07:01,926][00191] Num frames 8600... -[2024-09-29 16:07:02,049][00191] Num frames 8700... -[2024-09-29 16:07:02,176][00191] Num frames 8800... -[2024-09-29 16:07:02,300][00191] Num frames 8900... -[2024-09-29 16:07:02,475][00191] Avg episode rewards: #0: 19.596, true rewards: #0: 8.996 -[2024-09-29 16:07:02,478][00191] Avg episode reward: 19.596, avg true_objective: 8.996 -[2024-09-29 16:07:54,084][00191] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-09-29 16:10:31,478][00191] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-09-29 16:10:31,483][00191] Overriding arg 'num_workers' with value 1 passed from command line -[2024-09-29 16:10:31,485][00191] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-09-29 16:10:31,487][00191] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-09-29 16:10:31,491][00191] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-09-29 16:10:31,492][00191] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-09-29 16:10:31,495][00191] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-09-29 16:10:31,496][00191] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-09-29 16:10:31,498][00191] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-09-29 16:10:31,500][00191] Adding new argument 'hf_repository'='esperesa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-09-29 16:10:31,501][00191] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-09-29 16:10:31,507][00191] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-09-29 16:10:31,510][00191] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-09-29 16:10:31,512][00191] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-09-29 16:10:31,514][00191] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-09-29 16:10:31,552][00191] RunningMeanStd input shape: (3, 72, 128) -[2024-09-29 16:10:31,555][00191] RunningMeanStd input shape: (1,) -[2024-09-29 16:10:31,579][00191] ConvEncoder: input_channels=3 -[2024-09-29 16:10:31,638][00191] Conv encoder output size: 512 -[2024-09-29 16:10:31,639][00191] Policy head output size: 512 -[2024-09-29 16:10:31,663][00191] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-29 16:10:32,067][00191] Num frames 100... -[2024-09-29 16:10:32,192][00191] Num frames 200... -[2024-09-29 16:10:32,320][00191] Num frames 300... -[2024-09-29 16:10:32,444][00191] Num frames 400... -[2024-09-29 16:10:32,575][00191] Num frames 500... -[2024-09-29 16:10:32,693][00191] Num frames 600... -[2024-09-29 16:10:32,815][00191] Num frames 700... -[2024-09-29 16:10:32,921][00191] Avg episode rewards: #0: 14.410, true rewards: #0: 7.410 -[2024-09-29 16:10:32,923][00191] Avg episode reward: 14.410, avg true_objective: 7.410 -[2024-09-29 16:10:32,993][00191] Num frames 800... -[2024-09-29 16:10:33,121][00191] Num frames 900... -[2024-09-29 16:10:33,245][00191] Num frames 1000... -[2024-09-29 16:10:33,372][00191] Num frames 1100... -[2024-09-29 16:10:33,441][00191] Avg episode rewards: #0: 11.555, true rewards: #0: 5.555 -[2024-09-29 16:10:33,442][00191] Avg episode reward: 11.555, avg true_objective: 5.555 -[2024-09-29 16:10:33,553][00191] Num frames 1200... -[2024-09-29 16:10:33,677][00191] Num frames 1300... -[2024-09-29 16:10:33,800][00191] Num frames 1400... -[2024-09-29 16:10:33,924][00191] Num frames 1500... -[2024-09-29 16:10:34,051][00191] Avg episode rewards: #0: 9.530, true rewards: #0: 5.197 -[2024-09-29 16:10:34,052][00191] Avg episode reward: 9.530, avg true_objective: 5.197 -[2024-09-29 16:10:34,103][00191] Num frames 1600... -[2024-09-29 16:10:34,226][00191] Num frames 1700... -[2024-09-29 16:10:34,349][00191] Num frames 1800... -[2024-09-29 16:10:34,479][00191] Num frames 1900... -[2024-09-29 16:10:34,612][00191] Num frames 2000... -[2024-09-29 16:10:34,732][00191] Num frames 2100... -[2024-09-29 16:10:34,854][00191] Num frames 2200... -[2024-09-29 16:10:34,979][00191] Num frames 2300... -[2024-09-29 16:10:35,100][00191] Num frames 2400... -[2024-09-29 16:10:35,258][00191] Avg episode rewards: #0: 11.718, true rewards: #0: 6.217 -[2024-09-29 16:10:35,259][00191] Avg episode reward: 11.718, avg true_objective: 6.217 -[2024-09-29 16:10:35,278][00191] Num frames 2500... -[2024-09-29 16:10:35,412][00191] Num frames 2600... -[2024-09-29 16:10:35,534][00191] Num frames 2700... -[2024-09-29 16:10:35,665][00191] Num frames 2800... -[2024-09-29 16:10:35,784][00191] Num frames 2900... -[2024-09-29 16:10:35,911][00191] Num frames 3000... -[2024-09-29 16:10:36,080][00191] Avg episode rewards: #0: 11.190, true rewards: #0: 6.190 -[2024-09-29 16:10:36,082][00191] Avg episode reward: 11.190, avg true_objective: 6.190 -[2024-09-29 16:10:36,090][00191] Num frames 3100... -[2024-09-29 16:10:36,208][00191] Num frames 3200... -[2024-09-29 16:10:36,331][00191] Num frames 3300... -[2024-09-29 16:10:36,471][00191] Num frames 3400... -[2024-09-29 16:10:36,606][00191] Num frames 3500... -[2024-09-29 16:10:36,731][00191] Num frames 3600... -[2024-09-29 16:10:36,856][00191] Num frames 3700... -[2024-09-29 16:10:36,983][00191] Num frames 3800... -[2024-09-29 16:10:37,108][00191] Num frames 3900... -[2024-09-29 16:10:37,233][00191] Num frames 4000... -[2024-09-29 16:10:37,355][00191] Num frames 4100... -[2024-09-29 16:10:37,485][00191] Num frames 4200... -[2024-09-29 16:10:37,616][00191] Num frames 4300... -[2024-09-29 16:10:37,735][00191] Num frames 4400... -[2024-09-29 16:10:37,853][00191] Num frames 4500... -[2024-09-29 16:10:37,973][00191] Num frames 4600... -[2024-09-29 16:10:38,091][00191] Num frames 4700... -[2024-09-29 16:10:38,215][00191] Num frames 4800... -[2024-09-29 16:10:38,335][00191] Num frames 4900... -[2024-09-29 16:10:38,467][00191] Num frames 5000... -[2024-09-29 16:10:38,609][00191] Num frames 5100... -[2024-09-29 16:10:38,782][00191] Avg episode rewards: #0: 18.991, true rewards: #0: 8.658 -[2024-09-29 16:10:38,784][00191] Avg episode reward: 18.991, avg true_objective: 8.658 -[2024-09-29 16:10:38,793][00191] Num frames 5200... -[2024-09-29 16:10:38,915][00191] Num frames 5300... -[2024-09-29 16:10:39,036][00191] Num frames 5400... -[2024-09-29 16:10:39,159][00191] Num frames 5500... -[2024-09-29 16:10:39,295][00191] Num frames 5600... -[2024-09-29 16:10:39,418][00191] Num frames 5700... -[2024-09-29 16:10:39,554][00191] Num frames 5800... -[2024-09-29 16:10:39,696][00191] Num frames 5900... -[2024-09-29 16:10:39,820][00191] Num frames 6000... -[2024-09-29 16:10:39,946][00191] Num frames 6100... -[2024-09-29 16:10:40,067][00191] Num frames 6200... -[2024-09-29 16:10:40,186][00191] Avg episode rewards: #0: 19.216, true rewards: #0: 8.930 -[2024-09-29 16:10:40,188][00191] Avg episode reward: 19.216, avg true_objective: 8.930 -[2024-09-29 16:10:40,250][00191] Num frames 6300... -[2024-09-29 16:10:40,371][00191] Num frames 6400... -[2024-09-29 16:10:40,496][00191] Num frames 6500... -[2024-09-29 16:10:40,637][00191] Num frames 6600... -[2024-09-29 16:10:40,761][00191] Num frames 6700... -[2024-09-29 16:10:40,886][00191] Num frames 6800... -[2024-09-29 16:10:41,008][00191] Num frames 6900... -[2024-09-29 16:10:41,132][00191] Num frames 7000... -[2024-09-29 16:10:41,252][00191] Num frames 7100... -[2024-09-29 16:10:41,379][00191] Num frames 7200... -[2024-09-29 16:10:41,506][00191] Num frames 7300... -[2024-09-29 16:10:41,668][00191] Num frames 7400... -[2024-09-29 16:10:41,854][00191] Num frames 7500... -[2024-09-29 16:10:41,917][00191] Avg episode rewards: #0: 20.502, true rewards: #0: 9.377 -[2024-09-29 16:10:41,920][00191] Avg episode reward: 20.502, avg true_objective: 9.377 -[2024-09-29 16:10:42,082][00191] Num frames 7600... -[2024-09-29 16:10:42,249][00191] Num frames 7700... -[2024-09-29 16:10:42,421][00191] Num frames 7800... -[2024-09-29 16:10:42,593][00191] Num frames 7900... -[2024-09-29 16:10:42,762][00191] Num frames 8000... -[2024-09-29 16:10:42,933][00191] Num frames 8100... -[2024-09-29 16:10:43,108][00191] Num frames 8200... -[2024-09-29 16:10:43,231][00191] Avg episode rewards: #0: 19.709, true rewards: #0: 9.153 -[2024-09-29 16:10:43,233][00191] Avg episode reward: 19.709, avg true_objective: 9.153 -[2024-09-29 16:10:43,344][00191] Num frames 8300... -[2024-09-29 16:10:43,532][00191] Num frames 8400... -[2024-09-29 16:10:43,719][00191] Num frames 8500... -[2024-09-29 16:10:43,899][00191] Num frames 8600... -[2024-09-29 16:10:44,063][00191] Num frames 8700... -[2024-09-29 16:10:44,190][00191] Num frames 8800... -[2024-09-29 16:10:44,313][00191] Num frames 8900... -[2024-09-29 16:10:44,437][00191] Num frames 9000... -[2024-09-29 16:10:44,573][00191] Num frames 9100... -[2024-09-29 16:10:44,700][00191] Num frames 9200... -[2024-09-29 16:10:44,824][00191] Num frames 9300... -[2024-09-29 16:10:44,948][00191] Num frames 9400... -[2024-09-29 16:10:45,071][00191] Num frames 9500... -[2024-09-29 16:10:45,197][00191] Num frames 9600... -[2024-09-29 16:10:45,277][00191] Avg episode rewards: #0: 21.019, true rewards: #0: 9.619 -[2024-09-29 16:10:45,280][00191] Avg episode reward: 21.019, avg true_objective: 9.619 -[2024-09-29 16:11:39,958][00191] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-30 00:25:02,951][1148981] Using optimizer +[2024-09-30 00:25:03,366][1148693] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 1148693], exiting... +[2024-09-30 00:25:03,366][1148693] Runner profile tree view: +main_loop: 2.3244 +[2024-09-30 00:25:03,367][1148693] Collected {}, FPS: 0.0 +[2024-09-30 00:25:03,367][1148981] Stopping Batcher_0... +[2024-09-30 00:25:03,368][1148981] Loop batcher_evt_loop terminating... +[2024-09-30 00:25:03,637][1148981] No checkpoints found +[2024-09-30 00:25:03,637][1148981] Did not load from checkpoint, starting from scratch! +[2024-09-30 00:25:03,637][1148981] Initialized policy 0 weights for model version 0 +[2024-09-30 00:25:03,639][1148981] LearnerWorker_p0 finished initialization! +[2024-09-30 00:25:03,640][1148981] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-30 00:25:03,662][1148981] Stopping LearnerWorker_p0... +[2024-09-30 00:25:03,662][1148981] Loop learner_proc0_evt_loop terminating... +[2024-09-30 00:26:16,204][1149865] Saving configuration to /home/luyang/workspace/rl/train_dir/default_experiment/config.json... +[2024-09-30 00:26:16,209][1149865] Rollout worker 0 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 1 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 2 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 3 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 4 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 5 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 6 uses device cpu +[2024-09-30 00:26:16,209][1149865] Rollout worker 7 uses device cpu +[2024-09-30 00:26:16,252][1149865] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:26:16,252][1149865] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-30 00:26:16,286][1149865] Starting all processes... +[2024-09-30 00:26:16,286][1149865] Starting process learner_proc0 +[2024-09-30 00:26:17,897][1149865] Starting all processes... +[2024-09-30 00:26:17,901][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:26:17,901][1150061] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-30 00:26:17,901][1149865] Starting process inference_proc0-0 +[2024-09-30 00:26:17,901][1149865] Starting process rollout_proc0 +[2024-09-30 00:26:17,901][1149865] Starting process rollout_proc1 +[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc2 +[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc3 +[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc4 +[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc5 +[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc6 +[2024-09-30 00:26:17,903][1149865] Starting process rollout_proc7 +[2024-09-30 00:26:17,953][1150061] Num visible devices: 1 +[2024-09-30 00:26:17,959][1150061] Starting seed is not provided +[2024-09-30 00:26:17,959][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:26:17,959][1150061] Initializing actor-critic model on device cuda:0 +[2024-09-30 00:26:17,959][1150061] RunningMeanStd input shape: (3, 72, 128) +[2024-09-30 00:26:17,960][1150061] RunningMeanStd input shape: (1,) +[2024-09-30 00:26:17,968][1150061] ConvEncoder: input_channels=3 +[2024-09-30 00:26:18,041][1150061] Conv encoder output size: 512 +[2024-09-30 00:26:18,041][1150061] Policy head output size: 512 +[2024-09-30 00:26:18,052][1150061] Created Actor Critic model with architecture: +[2024-09-30 00:26:18,052][1150061] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-30 00:26:18,183][1150061] Using optimizer +[2024-09-30 00:26:18,816][1150061] Loading state from checkpoint /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-09-30 00:26:18,828][1150061] Loading model from checkpoint +[2024-09-30 00:26:18,829][1150061] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-09-30 00:26:18,829][1150061] Initialized policy 0 weights for model version 0 +[2024-09-30 00:26:18,831][1150061] LearnerWorker_p0 finished initialization! +[2024-09-30 00:26:18,831][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:26:19,422][1150142] Worker 3 uses CPU cores [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47] +[2024-09-30 00:26:19,449][1150140] Worker 7 uses CPU cores [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] +[2024-09-30 00:26:19,451][1150144] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] +[2024-09-30 00:26:19,456][1150145] Worker 6 uses CPU cores [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83] +[2024-09-30 00:26:19,456][1150137] Worker 5 uses CPU cores [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71] +[2024-09-30 00:26:19,462][1150141] Worker 4 uses CPU cores [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59] +[2024-09-30 00:26:19,465][1150143] Worker 1 uses CPU cores [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] +[2024-09-30 00:26:19,466][1150138] Worker 2 uses CPU cores [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35] +[2024-09-30 00:26:19,483][1150139] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-30 00:26:19,484][1150139] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-30 00:26:19,545][1150139] Num visible devices: 1 +[2024-09-30 00:26:19,557][1149865] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-30 00:26:19,639][1150139] RunningMeanStd input shape: (3, 72, 128) +[2024-09-30 00:26:19,640][1150139] RunningMeanStd input shape: (1,) +[2024-09-30 00:26:19,648][1150139] ConvEncoder: input_channels=3 +[2024-09-30 00:26:19,720][1150139] Conv encoder output size: 512 +[2024-09-30 00:26:19,720][1150139] Policy head output size: 512 +[2024-09-30 00:26:19,751][1149865] Inference worker 0-0 is ready! +[2024-09-30 00:26:19,751][1149865] All inference workers are ready! Signal rollout workers to start! +[2024-09-30 00:26:19,776][1150144] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,776][1150141] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,777][1150142] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,777][1150138] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,777][1150145] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,781][1150140] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,785][1150137] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:19,791][1150143] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:26:20,015][1150141] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,019][1150142] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,020][1150145] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,020][1150138] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,021][1150140] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,028][1150137] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,226][1150141] Decorrelating experience for 32 frames... +[2024-09-30 00:26:20,233][1150142] Decorrelating experience for 32 frames... +[2024-09-30 00:26:20,233][1150145] Decorrelating experience for 32 frames... +[2024-09-30 00:26:20,239][1150137] Decorrelating experience for 32 frames... +[2024-09-30 00:26:20,271][1150143] Decorrelating experience for 0 frames... +[2024-09-30 00:26:20,481][1150143] Decorrelating experience for 32 frames... +[2024-09-30 00:26:20,496][1150145] Decorrelating experience for 64 frames... +[2024-09-30 00:26:20,508][1150142] Decorrelating experience for 64 frames... +[2024-09-30 00:26:20,739][1150141] Decorrelating experience for 64 frames... +[2024-09-30 00:26:20,745][1150142] Decorrelating experience for 96 frames... +[2024-09-30 00:26:20,759][1150137] Decorrelating experience for 64 frames... +[2024-09-30 00:26:20,987][1150141] Decorrelating experience for 96 frames... +[2024-09-30 00:26:20,991][1150137] Decorrelating experience for 96 frames... +[2024-09-30 00:26:20,993][1150143] Decorrelating experience for 64 frames... +[2024-09-30 00:26:21,227][1150143] Decorrelating experience for 96 frames... +[2024-09-30 00:26:21,234][1150145] Decorrelating experience for 96 frames... +[2024-09-30 00:26:21,489][1150138] Decorrelating experience for 32 frames... +[2024-09-30 00:26:21,652][1150061] Signal inference workers to stop experience collection... +[2024-09-30 00:26:21,655][1150139] InferenceWorker_p0-w0: stopping experience collection +[2024-09-30 00:26:21,743][1150140] Decorrelating experience for 32 frames... +[2024-09-30 00:26:21,758][1150138] Decorrelating experience for 64 frames... +[2024-09-30 00:26:21,995][1150138] Decorrelating experience for 96 frames... +[2024-09-30 00:26:22,002][1150140] Decorrelating experience for 64 frames... +[2024-09-30 00:26:22,237][1150140] Decorrelating experience for 96 frames... +[2024-09-30 00:26:22,624][1150061] Signal inference workers to resume experience collection... +[2024-09-30 00:26:22,624][1150139] InferenceWorker_p0-w0: resuming experience collection +[2024-09-30 00:26:23,854][1150139] Updated weights for policy 0, policy_version 10 (0.0128) +[2024-09-30 00:26:24,557][1149865] Fps is (10 sec: 12288.1, 60 sec: 12288.1, 300 sec: 12288.1). Total num frames: 61440. Throughput: 0: 484.0. Samples: 2420. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-09-30 00:26:24,557][1149865] Avg episode reward: [(0, '4.453')] +[2024-09-30 00:26:25,059][1150139] Updated weights for policy 0, policy_version 20 (0.0006) +[2024-09-30 00:26:26,154][1150139] Updated weights for policy 0, policy_version 30 (0.0006) +[2024-09-30 00:26:27,304][1150139] Updated weights for policy 0, policy_version 40 (0.0006) +[2024-09-30 00:26:28,424][1150139] Updated weights for policy 0, policy_version 50 (0.0006) +[2024-09-30 00:26:29,557][1149865] Fps is (10 sec: 24166.4, 60 sec: 24166.4, 300 sec: 24166.4). Total num frames: 241664. Throughput: 0: 5481.2. Samples: 54812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-30 00:26:29,557][1149865] Avg episode reward: [(0, '4.420')] +[2024-09-30 00:26:29,562][1150061] Saving new best policy, reward=4.420! +[2024-09-30 00:26:29,562][1150139] Updated weights for policy 0, policy_version 60 (0.0005) +[2024-09-30 00:26:30,697][1150139] Updated weights for policy 0, policy_version 70 (0.0005) +[2024-09-30 00:26:31,899][1150139] Updated weights for policy 0, policy_version 80 (0.0005) +[2024-09-30 00:26:33,041][1150139] Updated weights for policy 0, policy_version 90 (0.0005) +[2024-09-30 00:26:34,164][1150139] Updated weights for policy 0, policy_version 100 (0.0006) +[2024-09-30 00:26:34,557][1149865] Fps is (10 sec: 36044.6, 60 sec: 28125.8, 300 sec: 28125.8). Total num frames: 421888. Throughput: 0: 5422.5. Samples: 81338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-30 00:26:34,557][1149865] Avg episode reward: [(0, '4.360')] +[2024-09-30 00:26:35,313][1150139] Updated weights for policy 0, policy_version 110 (0.0006) +[2024-09-30 00:26:36,243][1149865] Heartbeat connected on Batcher_0 +[2024-09-30 00:26:36,247][1149865] Heartbeat connected on LearnerWorker_p0 +[2024-09-30 00:26:36,254][1149865] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-30 00:26:36,261][1149865] Heartbeat connected on RolloutWorker_w1 +[2024-09-30 00:26:36,265][1149865] Heartbeat connected on RolloutWorker_w2 +[2024-09-30 00:26:36,270][1149865] Heartbeat connected on RolloutWorker_w3 +[2024-09-30 00:26:36,273][1149865] Heartbeat connected on RolloutWorker_w4 +[2024-09-30 00:26:36,278][1149865] Heartbeat connected on RolloutWorker_w5 +[2024-09-30 00:26:36,283][1149865] Heartbeat connected on RolloutWorker_w6 +[2024-09-30 00:26:36,286][1149865] Heartbeat connected on RolloutWorker_w7 +[2024-09-30 00:26:36,388][1150139] Updated weights for policy 0, policy_version 120 (0.0006) +[2024-09-30 00:26:37,485][1150139] Updated weights for policy 0, policy_version 130 (0.0005) +[2024-09-30 00:26:38,623][1150139] Updated weights for policy 0, policy_version 140 (0.0005) +[2024-09-30 00:26:39,557][1149865] Fps is (10 sec: 36454.4, 60 sec: 30310.4, 300 sec: 30310.4). Total num frames: 606208. Throughput: 0: 6816.1. Samples: 136322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-30 00:26:39,557][1149865] Avg episode reward: [(0, '4.362')] +[2024-09-30 00:26:39,793][1150139] Updated weights for policy 0, policy_version 150 (0.0005) +[2024-09-30 00:26:40,942][1150139] Updated weights for policy 0, policy_version 160 (0.0006) +[2024-09-30 00:26:42,120][1150139] Updated weights for policy 0, policy_version 170 (0.0006) +[2024-09-30 00:26:43,255][1150139] Updated weights for policy 0, policy_version 180 (0.0006) +[2024-09-30 00:26:44,363][1150139] Updated weights for policy 0, policy_version 190 (0.0006) +[2024-09-30 00:26:44,557][1149865] Fps is (10 sec: 36044.9, 60 sec: 31293.4, 300 sec: 31293.4). Total num frames: 782336. Throughput: 0: 7583.6. Samples: 189590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-30 00:26:44,557][1149865] Avg episode reward: [(0, '4.718')] +[2024-09-30 00:26:44,560][1150061] Saving new best policy, reward=4.718! +[2024-09-30 00:26:45,498][1150139] Updated weights for policy 0, policy_version 200 (0.0006) +[2024-09-30 00:26:46,658][1150139] Updated weights for policy 0, policy_version 210 (0.0005) +[2024-09-30 00:26:47,749][1150139] Updated weights for policy 0, policy_version 220 (0.0005) +[2024-09-30 00:26:48,879][1150139] Updated weights for policy 0, policy_version 230 (0.0006) +[2024-09-30 00:26:49,557][1149865] Fps is (10 sec: 36044.5, 60 sec: 32221.8, 300 sec: 32221.8). Total num frames: 966656. Throughput: 0: 7214.5. Samples: 216436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-30 00:26:49,558][1149865] Avg episode reward: [(0, '4.984')] +[2024-09-30 00:26:49,558][1150061] Saving new best policy, reward=4.984! +[2024-09-30 00:26:49,996][1150139] Updated weights for policy 0, policy_version 240 (0.0006) +[2024-09-30 00:26:51,078][1150139] Updated weights for policy 0, policy_version 250 (0.0005) +[2024-09-30 00:26:52,165][1150139] Updated weights for policy 0, policy_version 260 (0.0005) +[2024-09-30 00:26:53,360][1150139] Updated weights for policy 0, policy_version 270 (0.0006) +[2024-09-30 00:26:54,557][1149865] Fps is (10 sec: 36044.3, 60 sec: 32650.8, 300 sec: 32650.8). Total num frames: 1142784. Throughput: 0: 7775.5. Samples: 272144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-30 00:26:54,558][1149865] Avg episode reward: [(0, '6.825')] +[2024-09-30 00:26:54,561][1150061] Saving new best policy, reward=6.825! +[2024-09-30 00:26:54,627][1150139] Updated weights for policy 0, policy_version 280 (0.0006) +[2024-09-30 00:26:55,804][1150139] Updated weights for policy 0, policy_version 290 (0.0006) +[2024-09-30 00:26:56,921][1150139] Updated weights for policy 0, policy_version 300 (0.0006) +[2024-09-30 00:26:58,061][1150139] Updated weights for policy 0, policy_version 310 (0.0006) +[2024-09-30 00:26:59,306][1150139] Updated weights for policy 0, policy_version 320 (0.0006) +[2024-09-30 00:26:59,557][1149865] Fps is (10 sec: 35225.8, 60 sec: 32972.8, 300 sec: 32972.8). Total num frames: 1318912. Throughput: 0: 8094.2. Samples: 323770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-30 00:26:59,557][1149865] Avg episode reward: [(0, '7.969')] +[2024-09-30 00:26:59,558][1150061] Saving new best policy, reward=7.969! +[2024-09-30 00:27:00,379][1150139] Updated weights for policy 0, policy_version 330 (0.0006) +[2024-09-30 00:27:01,493][1150139] Updated weights for policy 0, policy_version 340 (0.0006) +[2024-09-30 00:27:02,648][1150139] Updated weights for policy 0, policy_version 350 (0.0005) +[2024-09-30 00:27:03,846][1150139] Updated weights for policy 0, policy_version 360 (0.0006) +[2024-09-30 00:27:04,557][1149865] Fps is (10 sec: 35635.7, 60 sec: 33314.1, 300 sec: 33314.1). Total num frames: 1499136. Throughput: 0: 7803.8. Samples: 351172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-30 00:27:04,557][1149865] Avg episode reward: [(0, '9.395')] +[2024-09-30 00:27:04,560][1150061] Saving new best policy, reward=9.395! +[2024-09-30 00:27:04,958][1150139] Updated weights for policy 0, policy_version 370 (0.0005) +[2024-09-30 00:27:06,158][1150139] Updated weights for policy 0, policy_version 380 (0.0006) +[2024-09-30 00:27:07,341][1150139] Updated weights for policy 0, policy_version 390 (0.0006) +[2024-09-30 00:27:08,471][1150139] Updated weights for policy 0, policy_version 400 (0.0006) +[2024-09-30 00:27:09,550][1150139] Updated weights for policy 0, policy_version 410 (0.0005) +[2024-09-30 00:27:09,557][1149865] Fps is (10 sec: 36045.1, 60 sec: 33587.2, 300 sec: 33587.2). Total num frames: 1679360. Throughput: 0: 8916.3. Samples: 403652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-30 00:27:09,557][1149865] Avg episode reward: [(0, '10.451')] +[2024-09-30 00:27:09,558][1150061] Saving new best policy, reward=10.451! +[2024-09-30 00:27:10,625][1150139] Updated weights for policy 0, policy_version 420 (0.0005) +[2024-09-30 00:27:11,753][1150139] Updated weights for policy 0, policy_version 430 (0.0006) +[2024-09-30 00:27:12,906][1150139] Updated weights for policy 0, policy_version 440 (0.0005) +[2024-09-30 00:27:14,043][1150139] Updated weights for policy 0, policy_version 450 (0.0005) +[2024-09-30 00:27:14,557][1149865] Fps is (10 sec: 36044.8, 60 sec: 33810.6, 300 sec: 33810.6). Total num frames: 1859584. Throughput: 0: 8974.4. Samples: 458662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-30 00:27:14,557][1149865] Avg episode reward: [(0, '13.438')] +[2024-09-30 00:27:14,560][1150061] Saving new best policy, reward=13.438! +[2024-09-30 00:27:15,159][1150139] Updated weights for policy 0, policy_version 460 (0.0006) +[2024-09-30 00:27:16,224][1150139] Updated weights for policy 0, policy_version 470 (0.0006) +[2024-09-30 00:27:17,339][1150139] Updated weights for policy 0, policy_version 480 (0.0006) +[2024-09-30 00:27:18,411][1150139] Updated weights for policy 0, policy_version 490 (0.0006) +[2024-09-30 00:27:19,490][1150139] Updated weights for policy 0, policy_version 500 (0.0006) +[2024-09-30 00:27:19,557][1149865] Fps is (10 sec: 36863.4, 60 sec: 34133.3, 300 sec: 34133.3). Total num frames: 2048000. Throughput: 0: 9008.8. Samples: 486736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-30 00:27:19,558][1149865] Avg episode reward: [(0, '15.719')] +[2024-09-30 00:27:19,558][1150061] Saving new best policy, reward=15.719! +[2024-09-30 00:27:20,560][1150139] Updated weights for policy 0, policy_version 510 (0.0006) +[2024-09-30 00:27:21,675][1150139] Updated weights for policy 0, policy_version 520 (0.0006) +[2024-09-30 00:27:22,733][1150139] Updated weights for policy 0, policy_version 530 (0.0006) +[2024-09-30 00:27:23,821][1150139] Updated weights for policy 0, policy_version 540 (0.0006) +[2024-09-30 00:27:24,557][1149865] Fps is (10 sec: 37683.2, 60 sec: 36249.6, 300 sec: 34406.4). Total num frames: 2236416. Throughput: 0: 9047.6. Samples: 543462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-30 00:27:24,557][1149865] Avg episode reward: [(0, '18.072')] +[2024-09-30 00:27:24,569][1150061] Saving new best policy, reward=18.072! +[2024-09-30 00:27:24,893][1150139] Updated weights for policy 0, policy_version 550 (0.0006) +[2024-09-30 00:27:25,971][1150139] Updated weights for policy 0, policy_version 560 (0.0005) +[2024-09-30 00:27:27,037][1150139] Updated weights for policy 0, policy_version 570 (0.0006) +[2024-09-30 00:27:28,155][1150139] Updated weights for policy 0, policy_version 580 (0.0005) +[2024-09-30 00:27:29,272][1150139] Updated weights for policy 0, policy_version 590 (0.0006) +[2024-09-30 00:27:29,557][1149865] Fps is (10 sec: 37683.7, 60 sec: 36386.1, 300 sec: 34640.5). Total num frames: 2424832. Throughput: 0: 9118.1. Samples: 599906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-30 00:27:29,557][1149865] Avg episode reward: [(0, '20.999')] +[2024-09-30 00:27:29,558][1150061] Saving new best policy, reward=20.999! +[2024-09-30 00:27:30,323][1150139] Updated weights for policy 0, policy_version 600 (0.0006) +[2024-09-30 00:27:31,407][1150139] Updated weights for policy 0, policy_version 610 (0.0006) +[2024-09-30 00:27:32,467][1150139] Updated weights for policy 0, policy_version 620 (0.0006) +[2024-09-30 00:27:33,593][1150139] Updated weights for policy 0, policy_version 630 (0.0006) +[2024-09-30 00:27:34,557][1149865] Fps is (10 sec: 37683.3, 60 sec: 36522.7, 300 sec: 34843.3). Total num frames: 2613248. Throughput: 0: 9160.0. Samples: 628636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-30 00:27:34,557][1149865] Avg episode reward: [(0, '20.389')] +[2024-09-30 00:27:34,683][1150139] Updated weights for policy 0, policy_version 640 (0.0005) +[2024-09-30 00:27:35,755][1150139] Updated weights for policy 0, policy_version 650 (0.0006) +[2024-09-30 00:27:36,830][1150139] Updated weights for policy 0, policy_version 660 (0.0006) +[2024-09-30 00:27:37,890][1150139] Updated weights for policy 0, policy_version 670 (0.0006) +[2024-09-30 00:27:38,971][1150139] Updated weights for policy 0, policy_version 680 (0.0006) +[2024-09-30 00:27:39,557][1149865] Fps is (10 sec: 38092.7, 60 sec: 36659.2, 300 sec: 35072.0). Total num frames: 2805760. Throughput: 0: 9181.5. Samples: 685308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-30 00:27:39,557][1149865] Avg episode reward: [(0, '19.388')] +[2024-09-30 00:27:40,031][1150139] Updated weights for policy 0, policy_version 690 (0.0006) +[2024-09-30 00:27:41,100][1150139] Updated weights for policy 0, policy_version 700 (0.0005) +[2024-09-30 00:27:42,153][1150139] Updated weights for policy 0, policy_version 710 (0.0006) +[2024-09-30 00:27:43,230][1150139] Updated weights for policy 0, policy_version 720 (0.0006) +[2024-09-30 00:27:44,312][1150139] Updated weights for policy 0, policy_version 730 (0.0006) +[2024-09-30 00:27:44,557][1149865] Fps is (10 sec: 38502.4, 60 sec: 36932.3, 300 sec: 35273.8). Total num frames: 2998272. Throughput: 0: 9311.9. Samples: 742806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-30 00:27:44,557][1149865] Avg episode reward: [(0, '22.197')] +[2024-09-30 00:27:44,560][1150061] Saving new best policy, reward=22.197! +[2024-09-30 00:27:45,376][1150139] Updated weights for policy 0, policy_version 740 (0.0006) +[2024-09-30 00:27:46,438][1150139] Updated weights for policy 0, policy_version 750 (0.0006) +[2024-09-30 00:27:47,515][1150139] Updated weights for policy 0, policy_version 760 (0.0005) +[2024-09-30 00:27:48,572][1150139] Updated weights for policy 0, policy_version 770 (0.0006) +[2024-09-30 00:27:49,557][1149865] Fps is (10 sec: 38502.4, 60 sec: 37068.9, 300 sec: 35453.2). Total num frames: 3190784. Throughput: 0: 9343.1. Samples: 771610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-30 00:27:49,557][1149865] Avg episode reward: [(0, '21.389')] +[2024-09-30 00:27:49,645][1150139] Updated weights for policy 0, policy_version 780 (0.0006) +[2024-09-30 00:27:50,716][1150139] Updated weights for policy 0, policy_version 790 (0.0005) +[2024-09-30 00:27:51,800][1150139] Updated weights for policy 0, policy_version 800 (0.0006) +[2024-09-30 00:27:52,845][1150139] Updated weights for policy 0, policy_version 810 (0.0006) +[2024-09-30 00:27:53,913][1150139] Updated weights for policy 0, policy_version 820 (0.0005) +[2024-09-30 00:27:54,557][1149865] Fps is (10 sec: 38092.7, 60 sec: 37273.7, 300 sec: 35570.5). Total num frames: 3379200. Throughput: 0: 9454.0. Samples: 829084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-30 00:27:54,557][1149865] Avg episode reward: [(0, '25.173')] +[2024-09-30 00:27:54,560][1150061] Saving new best policy, reward=25.173! +[2024-09-30 00:27:54,982][1150139] Updated weights for policy 0, policy_version 830 (0.0006) +[2024-09-30 00:27:56,101][1150139] Updated weights for policy 0, policy_version 840 (0.0006) +[2024-09-30 00:27:57,180][1150139] Updated weights for policy 0, policy_version 850 (0.0005) +[2024-09-30 00:27:58,258][1150139] Updated weights for policy 0, policy_version 860 (0.0006) +[2024-09-30 00:27:59,351][1150139] Updated weights for policy 0, policy_version 870 (0.0005) +[2024-09-30 00:27:59,557][1149865] Fps is (10 sec: 38092.9, 60 sec: 37546.7, 300 sec: 35717.1). Total num frames: 3571712. Throughput: 0: 9494.8. Samples: 885926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-30 00:27:59,557][1149865] Avg episode reward: [(0, '21.345')] +[2024-09-30 00:28:00,433][1150139] Updated weights for policy 0, policy_version 880 (0.0006) +[2024-09-30 00:28:01,499][1150139] Updated weights for policy 0, policy_version 890 (0.0006) +[2024-09-30 00:28:02,611][1150139] Updated weights for policy 0, policy_version 900 (0.0006) +[2024-09-30 00:28:03,681][1150139] Updated weights for policy 0, policy_version 910 (0.0006) +[2024-09-30 00:28:04,557][1149865] Fps is (10 sec: 38092.8, 60 sec: 37683.2, 300 sec: 35810.7). Total num frames: 3760128. Throughput: 0: 9499.9. Samples: 914232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-30 00:28:04,557][1149865] Avg episode reward: [(0, '25.023')] +[2024-09-30 00:28:04,751][1150139] Updated weights for policy 0, policy_version 920 (0.0006) +[2024-09-30 00:28:05,834][1150139] Updated weights for policy 0, policy_version 930 (0.0006) +[2024-09-30 00:28:07,053][1150139] Updated weights for policy 0, policy_version 940 (0.0006) +[2024-09-30 00:28:08,307][1150139] Updated weights for policy 0, policy_version 950 (0.0006) +[2024-09-30 00:28:09,418][1150139] Updated weights for policy 0, policy_version 960 (0.0006) +[2024-09-30 00:28:09,557][1149865] Fps is (10 sec: 36453.7, 60 sec: 37614.8, 300 sec: 35784.1). Total num frames: 3936256. Throughput: 0: 9452.1. Samples: 968808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-30 00:28:09,558][1149865] Avg episode reward: [(0, '26.893')] +[2024-09-30 00:28:09,558][1150061] Saving new best policy, reward=26.893! +[2024-09-30 00:28:10,576][1150139] Updated weights for policy 0, policy_version 970 (0.0006) +[2024-09-30 00:28:11,497][1149865] Component Batcher_0 stopped! +[2024-09-30 00:28:11,497][1150061] Stopping Batcher_0... +[2024-09-30 00:28:11,497][1149865] Component RolloutWorker_w0 process died already! Don't wait for it. +[2024-09-30 00:28:11,497][1150061] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-30 00:28:11,498][1150061] Loop batcher_evt_loop terminating... +[2024-09-30 00:28:11,513][1150139] Weights refcount: 2 0 +[2024-09-30 00:28:11,514][1150139] Stopping InferenceWorker_p0-w0... +[2024-09-30 00:28:11,514][1150139] Loop inference_proc0-0_evt_loop terminating... +[2024-09-30 00:28:11,514][1149865] Component InferenceWorker_p0-w0 stopped! +[2024-09-30 00:28:11,527][1150138] Stopping RolloutWorker_w2... +[2024-09-30 00:28:11,527][1149865] Component RolloutWorker_w2 stopped! +[2024-09-30 00:28:11,528][1150138] Loop rollout_proc2_evt_loop terminating... +[2024-09-30 00:28:11,530][1150142] Stopping RolloutWorker_w3... +[2024-09-30 00:28:11,530][1149865] Component RolloutWorker_w3 stopped! +[2024-09-30 00:28:11,530][1150142] Loop rollout_proc3_evt_loop terminating... +[2024-09-30 00:28:11,531][1149865] Component RolloutWorker_w5 stopped! +[2024-09-30 00:28:11,531][1150137] Stopping RolloutWorker_w5... +[2024-09-30 00:28:11,531][1149865] Component RolloutWorker_w6 stopped! +[2024-09-30 00:28:11,531][1150145] Stopping RolloutWorker_w6... +[2024-09-30 00:28:11,531][1150137] Loop rollout_proc5_evt_loop terminating... +[2024-09-30 00:28:11,531][1150145] Loop rollout_proc6_evt_loop terminating... +[2024-09-30 00:28:11,532][1149865] Component RolloutWorker_w1 stopped! +[2024-09-30 00:28:11,532][1150143] Stopping RolloutWorker_w1... +[2024-09-30 00:28:11,533][1150143] Loop rollout_proc1_evt_loop terminating... +[2024-09-30 00:28:11,533][1149865] Component RolloutWorker_w4 stopped! +[2024-09-30 00:28:11,533][1150141] Stopping RolloutWorker_w4... +[2024-09-30 00:28:11,533][1150141] Loop rollout_proc4_evt_loop terminating... +[2024-09-30 00:28:11,536][1149865] Component RolloutWorker_w7 stopped! +[2024-09-30 00:28:11,536][1150140] Stopping RolloutWorker_w7... +[2024-09-30 00:28:11,536][1150140] Loop rollout_proc7_evt_loop terminating... +[2024-09-30 00:28:11,548][1150061] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-30 00:28:11,677][1150061] Stopping LearnerWorker_p0... +[2024-09-30 00:28:11,677][1150061] Loop learner_proc0_evt_loop terminating... +[2024-09-30 00:28:11,677][1149865] Component LearnerWorker_p0 stopped! +[2024-09-30 00:28:11,678][1149865] Waiting for process learner_proc0 to stop... +[2024-09-30 00:28:12,213][1149865] Waiting for process inference_proc0-0 to join... +[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc0 to join... +[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc1 to join... +[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc2 to join... +[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc3 to join... +[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc4 to join... +[2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc5 to join... +[2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc6 to join... +[2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc7 to join... +[2024-09-30 00:28:12,215][1149865] Batcher 0 profile tree view: +batching: 8.1702, releasing_batches: 0.0148 +[2024-09-30 00:28:12,215][1149865] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 2.2430 +update_model: 1.6718 + weight_update: 0.0006 +one_step: 0.0013 + handle_policy_step: 101.6105 + deserialize: 4.2527, stack: 0.5251, obs_to_device_normalize: 21.3149, forward: 52.1177, send_messages: 6.7725 + prepare_outputs: 11.8901 + to_cpu: 6.4354 +[2024-09-30 00:28:12,216][1149865] Learner 0 profile tree view: +misc: 0.0031, prepare_batch: 4.0428 +train: 10.3860 + epoch_init: 0.0033, minibatch_init: 0.0037, losses_postprocess: 0.1662, kl_divergence: 0.2113, after_optimizer: 0.8304 + calculate_losses: 4.6270 + losses_init: 0.0020, forward_head: 0.3762, bptt_initial: 2.3802, tail: 0.3318, advantages_returns: 0.0873, losses: 0.6229 + bptt: 0.7209 + bptt_forward_core: 0.6909 + update: 4.3204 + clip: 0.4495 +[2024-09-30 00:28:12,216][1149865] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0819, enqueue_policy_requests: 4.5546, env_step: 67.0989, overhead: 3.2408, complete_rollouts: 0.1226 +save_policy_outputs: 5.6070 + split_output_tensors: 1.8787 +[2024-09-30 00:28:12,216][1149865] Loop Runner_EvtLoop terminating... +[2024-09-30 00:28:12,216][1149865] Runner profile tree view: +main_loop: 115.9303 +[2024-09-30 00:28:12,216][1149865] Collected {0: 4005888}, FPS: 34554.3 +[2024-09-30 00:28:12,419][1149865] Loading existing experiment configuration from /home/luyang/workspace/rl/train_dir/default_experiment/config.json +[2024-09-30 00:28:12,419][1149865] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-30 00:28:12,420][1149865] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'hf_repository'='esperesa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-30 00:28:12,420][1149865] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-30 00:28:12,441][1149865] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-30 00:28:12,443][1149865] RunningMeanStd input shape: (3, 72, 128) +[2024-09-30 00:28:12,443][1149865] RunningMeanStd input shape: (1,) +[2024-09-30 00:28:12,452][1149865] ConvEncoder: input_channels=3 +[2024-09-30 00:28:12,522][1149865] Conv encoder output size: 512 +[2024-09-30 00:28:12,522][1149865] Policy head output size: 512 +[2024-09-30 00:28:12,681][1149865] Loading state from checkpoint /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-30 00:28:13,271][1149865] Num frames 100... +[2024-09-30 00:28:13,350][1149865] Num frames 200... +[2024-09-30 00:28:13,427][1149865] Num frames 300... +[2024-09-30 00:28:13,503][1149865] Num frames 400... +[2024-09-30 00:28:13,581][1149865] Num frames 500... +[2024-09-30 00:28:13,659][1149865] Num frames 600... +[2024-09-30 00:28:13,738][1149865] Num frames 700... +[2024-09-30 00:28:13,815][1149865] Num frames 800... +[2024-09-30 00:28:13,894][1149865] Num frames 900... +[2024-09-30 00:28:13,973][1149865] Num frames 1000... +[2024-09-30 00:28:14,052][1149865] Num frames 1100... +[2024-09-30 00:28:14,132][1149865] Num frames 1200... +[2024-09-30 00:28:14,208][1149865] Num frames 1300... +[2024-09-30 00:28:14,285][1149865] Num frames 1400... +[2024-09-30 00:28:14,364][1149865] Num frames 1500... +[2024-09-30 00:28:14,445][1149865] Avg episode rewards: #0: 34.360, true rewards: #0: 15.360 +[2024-09-30 00:28:14,445][1149865] Avg episode reward: 34.360, avg true_objective: 15.360 +[2024-09-30 00:28:14,497][1149865] Num frames 1600... +[2024-09-30 00:28:14,575][1149865] Num frames 1700... +[2024-09-30 00:28:14,654][1149865] Num frames 1800... +[2024-09-30 00:28:14,733][1149865] Num frames 1900... +[2024-09-30 00:28:14,811][1149865] Num frames 2000... +[2024-09-30 00:28:14,891][1149865] Num frames 2100... +[2024-09-30 00:28:14,967][1149865] Num frames 2200... +[2024-09-30 00:28:15,045][1149865] Num frames 2300... +[2024-09-30 00:28:15,125][1149865] Num frames 2400... +[2024-09-30 00:28:15,204][1149865] Num frames 2500... +[2024-09-30 00:28:15,283][1149865] Num frames 2600... +[2024-09-30 00:28:15,363][1149865] Num frames 2700... +[2024-09-30 00:28:15,443][1149865] Num frames 2800... +[2024-09-30 00:28:15,523][1149865] Num frames 2900... +[2024-09-30 00:28:15,602][1149865] Num frames 3000... +[2024-09-30 00:28:15,681][1149865] Num frames 3100... +[2024-09-30 00:28:15,758][1149865] Num frames 3200... +[2024-09-30 00:28:15,834][1149865] Num frames 3300... +[2024-09-30 00:28:15,958][1149865] Avg episode rewards: #0: 40.959, true rewards: #0: 16.960 +[2024-09-30 00:28:15,958][1149865] Avg episode reward: 40.959, avg true_objective: 16.960 +[2024-09-30 00:28:15,966][1149865] Num frames 3400... +[2024-09-30 00:28:16,049][1149865] Num frames 3500... +[2024-09-30 00:28:16,127][1149865] Num frames 3600... +[2024-09-30 00:28:16,205][1149865] Num frames 3700... +[2024-09-30 00:28:16,284][1149865] Num frames 3800... +[2024-09-30 00:28:16,363][1149865] Num frames 3900... +[2024-09-30 00:28:16,443][1149865] Num frames 4000... +[2024-09-30 00:28:16,521][1149865] Num frames 4100... +[2024-09-30 00:28:16,598][1149865] Num frames 4200... +[2024-09-30 00:28:16,675][1149865] Num frames 4300... +[2024-09-30 00:28:16,754][1149865] Num frames 4400... +[2024-09-30 00:28:16,832][1149865] Num frames 4500... +[2024-09-30 00:28:16,919][1149865] Avg episode rewards: #0: 36.480, true rewards: #0: 15.147 +[2024-09-30 00:28:16,919][1149865] Avg episode reward: 36.480, avg true_objective: 15.147 +[2024-09-30 00:28:16,968][1149865] Num frames 4600... +[2024-09-30 00:28:17,047][1149865] Num frames 4700... +[2024-09-30 00:28:17,126][1149865] Num frames 4800... +[2024-09-30 00:28:17,206][1149865] Num frames 4900... +[2024-09-30 00:28:17,282][1149865] Num frames 5000... +[2024-09-30 00:28:17,358][1149865] Num frames 5100... +[2024-09-30 00:28:17,470][1149865] Avg episode rewards: #0: 31.192, true rewards: #0: 12.942 +[2024-09-30 00:28:17,471][1149865] Avg episode reward: 31.192, avg true_objective: 12.942 +[2024-09-30 00:28:17,490][1149865] Num frames 5200... +[2024-09-30 00:28:17,569][1149865] Num frames 5300... +[2024-09-30 00:28:17,649][1149865] Num frames 5400... +[2024-09-30 00:28:17,729][1149865] Num frames 5500... +[2024-09-30 00:28:17,807][1149865] Num frames 5600... +[2024-09-30 00:28:17,887][1149865] Num frames 5700... +[2024-09-30 00:28:17,966][1149865] Num frames 5800... +[2024-09-30 00:28:18,043][1149865] Num frames 5900... +[2024-09-30 00:28:18,120][1149865] Num frames 6000... +[2024-09-30 00:28:18,196][1149865] Num frames 6100... +[2024-09-30 00:28:18,274][1149865] Num frames 6200... +[2024-09-30 00:28:18,352][1149865] Num frames 6300... +[2024-09-30 00:28:18,477][1149865] Avg episode rewards: #0: 30.786, true rewards: #0: 12.786 +[2024-09-30 00:28:18,477][1149865] Avg episode reward: 30.786, avg true_objective: 12.786 +[2024-09-30 00:28:18,484][1149865] Num frames 6400... +[2024-09-30 00:28:18,564][1149865] Num frames 6500... +[2024-09-30 00:28:18,643][1149865] Num frames 6600... +[2024-09-30 00:28:18,723][1149865] Num frames 6700... +[2024-09-30 00:28:18,803][1149865] Num frames 6800... +[2024-09-30 00:28:18,879][1149865] Num frames 6900... +[2024-09-30 00:28:18,957][1149865] Num frames 7000... +[2024-09-30 00:28:19,033][1149865] Num frames 7100... +[2024-09-30 00:28:19,110][1149865] Num frames 7200... +[2024-09-30 00:28:19,190][1149865] Num frames 7300... +[2024-09-30 00:28:19,271][1149865] Num frames 7400... +[2024-09-30 00:28:19,353][1149865] Num frames 7500... +[2024-09-30 00:28:19,443][1149865] Num frames 7600... +[2024-09-30 00:28:19,538][1149865] Num frames 7700... +[2024-09-30 00:28:19,630][1149865] Num frames 7800... +[2024-09-30 00:28:19,721][1149865] Num frames 7900... +[2024-09-30 00:28:19,817][1149865] Num frames 8000... +[2024-09-30 00:28:19,910][1149865] Num frames 8100... +[2024-09-30 00:28:20,014][1149865] Avg episode rewards: #0: 33.588, true rewards: #0: 13.588 +[2024-09-30 00:28:20,015][1149865] Avg episode reward: 33.588, avg true_objective: 13.588 +[2024-09-30 00:28:20,062][1149865] Num frames 8200... +[2024-09-30 00:28:20,156][1149865] Num frames 8300... +[2024-09-30 00:28:20,246][1149865] Num frames 8400... +[2024-09-30 00:28:20,341][1149865] Num frames 8500... +[2024-09-30 00:28:20,433][1149865] Num frames 8600... +[2024-09-30 00:28:20,523][1149865] Num frames 8700... +[2024-09-30 00:28:20,616][1149865] Num frames 8800... +[2024-09-30 00:28:20,707][1149865] Num frames 8900... +[2024-09-30 00:28:20,800][1149865] Num frames 9000... +[2024-09-30 00:28:20,892][1149865] Num frames 9100... +[2024-09-30 00:28:20,986][1149865] Num frames 9200... +[2024-09-30 00:28:21,079][1149865] Num frames 9300... +[2024-09-30 00:28:21,172][1149865] Num frames 9400... +[2024-09-30 00:28:21,264][1149865] Num frames 9500... +[2024-09-30 00:28:21,356][1149865] Num frames 9600... +[2024-09-30 00:28:21,451][1149865] Num frames 9700... +[2024-09-30 00:28:21,526][1149865] Avg episode rewards: #0: 34.030, true rewards: #0: 13.887 +[2024-09-30 00:28:21,526][1149865] Avg episode reward: 34.030, avg true_objective: 13.887 +[2024-09-30 00:28:21,591][1149865] Num frames 9800... +[2024-09-30 00:28:21,672][1149865] Num frames 9900... +[2024-09-30 00:28:21,757][1149865] Num frames 10000... +[2024-09-30 00:28:21,850][1149865] Num frames 10100... +[2024-09-30 00:28:21,945][1149865] Num frames 10200... +[2024-09-30 00:28:22,035][1149865] Num frames 10300... +[2024-09-30 00:28:22,128][1149865] Num frames 10400... +[2024-09-30 00:28:22,220][1149865] Num frames 10500... +[2024-09-30 00:28:22,312][1149865] Num frames 10600... +[2024-09-30 00:28:22,393][1149865] Num frames 10700... +[2024-09-30 00:28:22,474][1149865] Num frames 10800... +[2024-09-30 00:28:22,537][1149865] Avg episode rewards: #0: 32.886, true rewards: #0: 13.511 +[2024-09-30 00:28:22,537][1149865] Avg episode reward: 32.886, avg true_objective: 13.511 +[2024-09-30 00:28:22,621][1149865] Num frames 10900... +[2024-09-30 00:28:22,714][1149865] Num frames 11000... +[2024-09-30 00:28:22,806][1149865] Num frames 11100... +[2024-09-30 00:28:22,898][1149865] Num frames 11200... +[2024-09-30 00:28:22,990][1149865] Num frames 11300... +[2024-09-30 00:28:23,082][1149865] Num frames 11400... +[2024-09-30 00:28:23,165][1149865] Num frames 11500... +[2024-09-30 00:28:23,247][1149865] Num frames 11600... +[2024-09-30 00:28:23,338][1149865] Num frames 11700... +[2024-09-30 00:28:23,432][1149865] Num frames 11800... +[2024-09-30 00:28:23,522][1149865] Num frames 11900... +[2024-09-30 00:28:23,616][1149865] Num frames 12000... +[2024-09-30 00:28:23,730][1149865] Num frames 12100... +[2024-09-30 00:28:23,823][1149865] Num frames 12200... +[2024-09-30 00:28:23,904][1149865] Num frames 12300... +[2024-09-30 00:28:23,983][1149865] Num frames 12400... +[2024-09-30 00:28:24,063][1149865] Num frames 12500... +[2024-09-30 00:28:24,150][1149865] Num frames 12600... +[2024-09-30 00:28:24,230][1149865] Num frames 12700... +[2024-09-30 00:28:24,306][1149865] Num frames 12800... +[2024-09-30 00:28:24,390][1149865] Avg episode rewards: #0: 35.268, true rewards: #0: 14.268 +[2024-09-30 00:28:24,391][1149865] Avg episode reward: 35.268, avg true_objective: 14.268 +[2024-09-30 00:28:24,438][1149865] Num frames 12900... +[2024-09-30 00:28:24,516][1149865] Num frames 13000... +[2024-09-30 00:28:24,594][1149865] Num frames 13100... +[2024-09-30 00:28:24,675][1149865] Num frames 13200... +[2024-09-30 00:28:24,755][1149865] Num frames 13300... +[2024-09-30 00:28:24,833][1149865] Num frames 13400... +[2024-09-30 00:28:24,946][1149865] Avg episode rewards: #0: 33.076, true rewards: #0: 13.476 +[2024-09-30 00:28:24,946][1149865] Avg episode reward: 33.076, avg true_objective: 13.476 +[2024-09-30 00:28:42,313][1149865] Replay video saved to /home/luyang/workspace/rl/train_dir/default_experiment/replay.mp4!