[2024-09-29 15:49:05,181][00191] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-29 15:49:05,184][00191] Rollout worker 0 uses device cpu [2024-09-29 15:49:05,185][00191] Rollout worker 1 uses device cpu [2024-09-29 15:49:05,187][00191] Rollout worker 2 uses device cpu [2024-09-29 15:49:05,188][00191] Rollout worker 3 uses device cpu [2024-09-29 15:49:05,189][00191] Rollout worker 4 uses device cpu [2024-09-29 15:49:05,190][00191] Rollout worker 5 uses device cpu [2024-09-29 15:49:05,192][00191] Rollout worker 6 uses device cpu [2024-09-29 15:49:05,193][00191] Rollout worker 7 uses device cpu [2024-09-29 15:49:05,347][00191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 15:49:05,349][00191] InferenceWorker_p0-w0: min num requests: 2 [2024-09-29 15:49:05,386][00191] Starting all processes... [2024-09-29 15:49:05,388][00191] Starting process learner_proc0 [2024-09-29 15:49:06,048][00191] Starting all processes... [2024-09-29 15:49:06,058][00191] Starting process inference_proc0-0 [2024-09-29 15:49:06,059][00191] Starting process rollout_proc0 [2024-09-29 15:49:06,060][00191] Starting process rollout_proc1 [2024-09-29 15:49:06,061][00191] Starting process rollout_proc2 [2024-09-29 15:49:06,061][00191] Starting process rollout_proc3 [2024-09-29 15:49:06,061][00191] Starting process rollout_proc4 [2024-09-29 15:49:06,061][00191] Starting process rollout_proc5 [2024-09-29 15:49:06,061][00191] Starting process rollout_proc6 [2024-09-29 15:49:06,061][00191] Starting process rollout_proc7 [2024-09-29 15:49:21,749][05174] Worker 6 uses CPU cores [0] [2024-09-29 15:49:21,915][05169] Worker 3 uses CPU cores [1] [2024-09-29 15:49:21,949][05170] Worker 2 uses CPU cores [0] [2024-09-29 15:49:22,011][05167] Worker 1 uses CPU cores [1] [2024-09-29 15:49:22,077][05153] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 15:49:22,083][05153] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-29 15:49:22,124][05153] Num visible devices: 1 [2024-09-29 15:49:22,164][05153] Starting seed is not provided [2024-09-29 15:49:22,165][05153] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 15:49:22,165][05153] Initializing actor-critic model on device cuda:0 [2024-09-29 15:49:22,166][05153] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 15:49:22,169][05153] RunningMeanStd input shape: (1,) [2024-09-29 15:49:22,203][05166] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 15:49:22,204][05166] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-29 15:49:22,239][05153] ConvEncoder: input_channels=3 [2024-09-29 15:49:22,242][05166] Num visible devices: 1 [2024-09-29 15:49:22,299][05172] Worker 5 uses CPU cores [1] [2024-09-29 15:49:22,317][05173] Worker 7 uses CPU cores [1] [2024-09-29 15:49:22,323][05168] Worker 0 uses CPU cores [0] [2024-09-29 15:49:22,345][05171] Worker 4 uses CPU cores [0] [2024-09-29 15:49:22,506][05153] Conv encoder output size: 512 [2024-09-29 15:49:22,506][05153] Policy head output size: 512 [2024-09-29 15:49:22,562][05153] Created Actor Critic model with architecture: [2024-09-29 15:49:22,563][05153] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-29 15:49:22,944][05153] Using optimizer [2024-09-29 15:49:23,600][05153] No checkpoints found [2024-09-29 15:49:23,600][05153] Did not load from checkpoint, starting from scratch! [2024-09-29 15:49:23,601][05153] Initialized policy 0 weights for model version 0 [2024-09-29 15:49:23,605][05153] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-29 15:49:23,611][05153] LearnerWorker_p0 finished initialization! [2024-09-29 15:49:23,796][05166] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 15:49:23,797][05166] RunningMeanStd input shape: (1,) [2024-09-29 15:49:23,809][05166] ConvEncoder: input_channels=3 [2024-09-29 15:49:23,910][05166] Conv encoder output size: 512 [2024-09-29 15:49:23,911][05166] Policy head output size: 512 [2024-09-29 15:49:23,961][00191] Inference worker 0-0 is ready! [2024-09-29 15:49:23,962][00191] All inference workers are ready! Signal rollout workers to start! [2024-09-29 15:49:24,158][05172] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,153][05167] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,163][05174] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,162][05169] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,163][05173] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,165][05168] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,157][05171] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,161][05170] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 15:49:24,931][00191] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 15:49:25,182][05168] Decorrelating experience for 0 frames... [2024-09-29 15:49:25,184][05174] Decorrelating experience for 0 frames... [2024-09-29 15:49:25,340][00191] Heartbeat connected on Batcher_0 [2024-09-29 15:49:25,345][00191] Heartbeat connected on LearnerWorker_p0 [2024-09-29 15:49:25,374][00191] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-29 15:49:25,525][05172] Decorrelating experience for 0 frames... [2024-09-29 15:49:25,525][05169] Decorrelating experience for 0 frames... [2024-09-29 15:49:25,529][05167] Decorrelating experience for 0 frames... [2024-09-29 15:49:25,915][05167] Decorrelating experience for 32 frames... [2024-09-29 15:49:26,317][05168] Decorrelating experience for 32 frames... [2024-09-29 15:49:26,447][05171] Decorrelating experience for 0 frames... [2024-09-29 15:49:26,480][05167] Decorrelating experience for 64 frames... [2024-09-29 15:49:27,023][05174] Decorrelating experience for 32 frames... [2024-09-29 15:49:27,038][05170] Decorrelating experience for 0 frames... [2024-09-29 15:49:28,085][05172] Decorrelating experience for 32 frames... [2024-09-29 15:49:28,144][05171] Decorrelating experience for 32 frames... [2024-09-29 15:49:28,595][05167] Decorrelating experience for 96 frames... [2024-09-29 15:49:28,901][00191] Heartbeat connected on RolloutWorker_w1 [2024-09-29 15:49:28,913][05168] Decorrelating experience for 64 frames... [2024-09-29 15:49:28,947][05170] Decorrelating experience for 32 frames... [2024-09-29 15:49:29,070][05173] Decorrelating experience for 0 frames... [2024-09-29 15:49:29,902][05169] Decorrelating experience for 32 frames... [2024-09-29 15:49:29,931][00191] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 15:49:30,406][05172] Decorrelating experience for 64 frames... [2024-09-29 15:49:31,549][05171] Decorrelating experience for 64 frames... [2024-09-29 15:49:31,817][05174] Decorrelating experience for 64 frames... [2024-09-29 15:49:31,977][05168] Decorrelating experience for 96 frames... [2024-09-29 15:49:32,267][00191] Heartbeat connected on RolloutWorker_w0 [2024-09-29 15:49:32,451][05170] Decorrelating experience for 64 frames... [2024-09-29 15:49:32,693][05173] Decorrelating experience for 32 frames... [2024-09-29 15:49:33,994][05174] Decorrelating experience for 96 frames... [2024-09-29 15:49:34,126][05169] Decorrelating experience for 64 frames... [2024-09-29 15:49:34,278][00191] Heartbeat connected on RolloutWorker_w6 [2024-09-29 15:49:34,348][05172] Decorrelating experience for 96 frames... [2024-09-29 15:49:34,547][05170] Decorrelating experience for 96 frames... [2024-09-29 15:49:34,781][00191] Heartbeat connected on RolloutWorker_w5 [2024-09-29 15:49:34,797][00191] Heartbeat connected on RolloutWorker_w2 [2024-09-29 15:49:34,931][00191] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 37.0. Samples: 370. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 15:49:34,936][00191] Avg episode reward: [(0, '2.559')] [2024-09-29 15:49:36,018][05173] Decorrelating experience for 64 frames... [2024-09-29 15:49:36,391][05171] Decorrelating experience for 96 frames... [2024-09-29 15:49:37,010][00191] Heartbeat connected on RolloutWorker_w4 [2024-09-29 15:49:38,505][05169] Decorrelating experience for 96 frames... [2024-09-29 15:49:38,599][05153] Signal inference workers to stop experience collection... [2024-09-29 15:49:38,596][05173] Decorrelating experience for 96 frames... [2024-09-29 15:49:38,642][05166] InferenceWorker_p0-w0: stopping experience collection [2024-09-29 15:49:38,695][00191] Heartbeat connected on RolloutWorker_w3 [2024-09-29 15:49:38,762][00191] Heartbeat connected on RolloutWorker_w7 [2024-09-29 15:49:39,931][00191] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 206.8. Samples: 3102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-29 15:49:39,937][00191] Avg episode reward: [(0, '2.844')] [2024-09-29 15:49:40,748][05153] Signal inference workers to resume experience collection... [2024-09-29 15:49:40,753][05166] InferenceWorker_p0-w0: resuming experience collection [2024-09-29 15:49:44,936][00191] Fps is (10 sec: 2047.0, 60 sec: 1023.7, 300 sec: 1023.7). Total num frames: 20480. Throughput: 0: 191.0. Samples: 3820. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-09-29 15:49:44,941][00191] Avg episode reward: [(0, '3.301')] [2024-09-29 15:49:49,931][00191] Fps is (10 sec: 3686.3, 60 sec: 1474.5, 300 sec: 1474.5). Total num frames: 36864. Throughput: 0: 345.0. Samples: 8624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:49:49,936][00191] Avg episode reward: [(0, '3.664')] [2024-09-29 15:49:50,751][05166] Updated weights for policy 0, policy_version 10 (0.0048) [2024-09-29 15:49:54,931][00191] Fps is (10 sec: 4097.9, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 485.9. Samples: 14578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 15:49:54,938][00191] Avg episode reward: [(0, '4.352')] [2024-09-29 15:49:59,426][05166] Updated weights for policy 0, policy_version 20 (0.0024) [2024-09-29 15:49:59,931][00191] Fps is (10 sec: 4505.7, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 519.1. Samples: 18170. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 15:49:59,938][00191] Avg episode reward: [(0, '4.723')] [2024-09-29 15:50:04,931][00191] Fps is (10 sec: 3686.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 600.0. Samples: 24002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:50:04,936][00191] Avg episode reward: [(0, '4.487')] [2024-09-29 15:50:09,931][00191] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 114688. Throughput: 0: 644.6. Samples: 29008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:50:09,934][00191] Avg episode reward: [(0, '4.105')] [2024-09-29 15:50:09,942][05153] Saving new best policy, reward=4.105! [2024-09-29 15:50:10,894][05166] Updated weights for policy 0, policy_version 30 (0.0030) [2024-09-29 15:50:14,931][00191] Fps is (10 sec: 4096.0, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 716.2. Samples: 32230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:50:14,937][00191] Avg episode reward: [(0, '4.482')] [2024-09-29 15:50:14,943][05153] Saving new best policy, reward=4.482! [2024-09-29 15:50:19,933][00191] Fps is (10 sec: 4095.4, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 155648. Throughput: 0: 859.6. Samples: 39054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:50:19,936][00191] Avg episode reward: [(0, '4.632')] [2024-09-29 15:50:19,998][05153] Saving new best policy, reward=4.632! [2024-09-29 15:50:21,629][05166] Updated weights for policy 0, policy_version 40 (0.0021) [2024-09-29 15:50:24,931][00191] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 893.6. Samples: 43314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:50:24,938][00191] Avg episode reward: [(0, '4.450')] [2024-09-29 15:50:29,939][00191] Fps is (10 sec: 4093.5, 60 sec: 3276.4, 300 sec: 3024.4). Total num frames: 196608. Throughput: 0: 953.3. Samples: 46722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 15:50:29,941][00191] Avg episode reward: [(0, '4.332')] [2024-09-29 15:50:31,027][05166] Updated weights for policy 0, policy_version 50 (0.0036) [2024-09-29 15:50:34,933][00191] Fps is (10 sec: 4914.3, 60 sec: 3686.3, 300 sec: 3159.7). Total num frames: 221184. Throughput: 0: 1005.8. Samples: 53888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 15:50:34,937][00191] Avg episode reward: [(0, '4.357')] [2024-09-29 15:50:39,931][00191] Fps is (10 sec: 3689.2, 60 sec: 3891.2, 300 sec: 3113.0). Total num frames: 233472. Throughput: 0: 983.4. Samples: 58830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 15:50:39,937][00191] Avg episode reward: [(0, '4.365')] [2024-09-29 15:50:42,629][05166] Updated weights for policy 0, policy_version 60 (0.0028) [2024-09-29 15:50:44,931][00191] Fps is (10 sec: 3277.4, 60 sec: 3891.5, 300 sec: 3174.4). Total num frames: 253952. Throughput: 0: 955.7. Samples: 61176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:50:44,937][00191] Avg episode reward: [(0, '4.468')] [2024-09-29 15:50:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 985.4. Samples: 68346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:50:49,936][00191] Avg episode reward: [(0, '4.519')] [2024-09-29 15:50:51,255][05166] Updated weights for policy 0, policy_version 70 (0.0021) [2024-09-29 15:50:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 294912. Throughput: 0: 1004.2. Samples: 74196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:50:54,934][00191] Avg episode reward: [(0, '4.502')] [2024-09-29 15:50:59,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 311296. Throughput: 0: 980.0. Samples: 76330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:50:59,939][00191] Avg episode reward: [(0, '4.368')] [2024-09-29 15:50:59,947][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth... [2024-09-29 15:51:02,777][05166] Updated weights for policy 0, policy_version 80 (0.0036) [2024-09-29 15:51:04,931][00191] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 970.6. Samples: 82730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:51:04,934][00191] Avg episode reward: [(0, '4.287')] [2024-09-29 15:51:09,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 1025.6. Samples: 89464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:51:09,933][00191] Avg episode reward: [(0, '4.269')] [2024-09-29 15:51:13,978][05166] Updated weights for policy 0, policy_version 90 (0.0055) [2024-09-29 15:51:14,933][00191] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3351.2). Total num frames: 368640. Throughput: 0: 995.7. Samples: 91524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:51:14,935][00191] Avg episode reward: [(0, '4.264')] [2024-09-29 15:51:19,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 954.1. Samples: 96822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:51:19,933][00191] Avg episode reward: [(0, '4.571')] [2024-09-29 15:51:23,247][05166] Updated weights for policy 0, policy_version 100 (0.0045) [2024-09-29 15:51:24,932][00191] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 1004.0. Samples: 104010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:51:24,937][00191] Avg episode reward: [(0, '4.714')] [2024-09-29 15:51:24,956][05153] Saving new best policy, reward=4.714! [2024-09-29 15:51:29,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.7, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 1015.5. Samples: 106874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 15:51:29,936][00191] Avg episode reward: [(0, '4.681')] [2024-09-29 15:51:34,446][05166] Updated weights for policy 0, policy_version 110 (0.0019) [2024-09-29 15:51:34,931][00191] Fps is (10 sec: 3686.6, 60 sec: 3823.0, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 954.6. Samples: 111304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:51:34,934][00191] Avg episode reward: [(0, '4.717')] [2024-09-29 15:51:34,939][05153] Saving new best policy, reward=4.717! [2024-09-29 15:51:39,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3519.5). Total num frames: 475136. Throughput: 0: 982.1. Samples: 118390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:51:39,939][00191] Avg episode reward: [(0, '4.711')] [2024-09-29 15:51:44,026][05166] Updated weights for policy 0, policy_version 120 (0.0031) [2024-09-29 15:51:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 1012.5. Samples: 121892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:51:44,933][00191] Avg episode reward: [(0, '4.703')] [2024-09-29 15:51:49,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 970.9. Samples: 126420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:51:49,938][00191] Avg episode reward: [(0, '4.723')] [2024-09-29 15:51:49,948][05153] Saving new best policy, reward=4.723! [2024-09-29 15:51:54,872][05166] Updated weights for policy 0, policy_version 130 (0.0038) [2024-09-29 15:51:54,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 532480. Throughput: 0: 958.1. Samples: 132578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:51:54,934][00191] Avg episode reward: [(0, '4.676')] [2024-09-29 15:51:59,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 992.5. Samples: 136186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:51:59,935][00191] Avg episode reward: [(0, '4.553')] [2024-09-29 15:52:04,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 999.6. Samples: 141806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:52:04,938][00191] Avg episode reward: [(0, '4.543')] [2024-09-29 15:52:06,046][05166] Updated weights for policy 0, policy_version 140 (0.0039) [2024-09-29 15:52:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 955.9. Samples: 147024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 15:52:09,933][00191] Avg episode reward: [(0, '4.675')] [2024-09-29 15:52:14,931][00191] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 971.2. Samples: 150576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 15:52:14,934][00191] Avg episode reward: [(0, '4.704')] [2024-09-29 15:52:15,216][05166] Updated weights for policy 0, policy_version 150 (0.0049) [2024-09-29 15:52:19,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 1016.6. Samples: 157052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:52:19,937][00191] Avg episode reward: [(0, '4.877')] [2024-09-29 15:52:19,949][05153] Saving new best policy, reward=4.877! [2024-09-29 15:52:24,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3595.4). Total num frames: 647168. Throughput: 0: 957.4. Samples: 161474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:52:24,938][00191] Avg episode reward: [(0, '4.718')] [2024-09-29 15:52:26,491][05166] Updated weights for policy 0, policy_version 160 (0.0030) [2024-09-29 15:52:29,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3608.9). Total num frames: 667648. Throughput: 0: 959.3. Samples: 165060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:52:29,936][00191] Avg episode reward: [(0, '4.504')] [2024-09-29 15:52:34,936][00191] Fps is (10 sec: 4503.6, 60 sec: 4027.4, 300 sec: 3643.2). Total num frames: 692224. Throughput: 0: 1016.7. Samples: 172178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:52:34,938][00191] Avg episode reward: [(0, '5.020')] [2024-09-29 15:52:34,940][05153] Saving new best policy, reward=5.020! [2024-09-29 15:52:35,833][05166] Updated weights for policy 0, policy_version 170 (0.0043) [2024-09-29 15:52:39,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 981.6. Samples: 176750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:52:39,938][00191] Avg episode reward: [(0, '5.194')] [2024-09-29 15:52:39,951][05153] Saving new best policy, reward=5.194! [2024-09-29 15:52:44,931][00191] Fps is (10 sec: 3278.2, 60 sec: 3891.2, 300 sec: 3625.0). Total num frames: 724992. Throughput: 0: 962.9. Samples: 179516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:52:44,933][00191] Avg episode reward: [(0, '4.952')] [2024-09-29 15:52:46,694][05166] Updated weights for policy 0, policy_version 180 (0.0033) [2024-09-29 15:52:49,931][00191] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3656.4). Total num frames: 749568. Throughput: 0: 992.0. Samples: 186446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:52:49,934][00191] Avg episode reward: [(0, '4.852')] [2024-09-29 15:52:54,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3647.4). Total num frames: 765952. Throughput: 0: 999.7. Samples: 192012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:52:54,934][00191] Avg episode reward: [(0, '4.821')] [2024-09-29 15:52:57,846][05166] Updated weights for policy 0, policy_version 190 (0.0032) [2024-09-29 15:52:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3657.8). Total num frames: 786432. Throughput: 0: 970.1. Samples: 194232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:52:59,936][00191] Avg episode reward: [(0, '5.111')] [2024-09-29 15:52:59,945][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... [2024-09-29 15:53:04,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3667.8). Total num frames: 806912. Throughput: 0: 978.4. Samples: 201080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:53:04,936][00191] Avg episode reward: [(0, '5.469')] [2024-09-29 15:53:04,938][05153] Saving new best policy, reward=5.469! [2024-09-29 15:53:06,711][05166] Updated weights for policy 0, policy_version 200 (0.0026) [2024-09-29 15:53:09,935][00191] Fps is (10 sec: 4094.5, 60 sec: 3959.2, 300 sec: 3677.2). Total num frames: 827392. Throughput: 0: 1022.5. Samples: 207488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:53:09,938][00191] Avg episode reward: [(0, '5.560')] [2024-09-29 15:53:09,950][05153] Saving new best policy, reward=5.560! [2024-09-29 15:53:14,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3668.6). Total num frames: 843776. Throughput: 0: 989.0. Samples: 209566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:53:14,934][00191] Avg episode reward: [(0, '5.574')] [2024-09-29 15:53:14,936][05153] Saving new best policy, reward=5.574! [2024-09-29 15:53:18,293][05166] Updated weights for policy 0, policy_version 210 (0.0038) [2024-09-29 15:53:19,931][00191] Fps is (10 sec: 3687.6, 60 sec: 3891.2, 300 sec: 3677.7). Total num frames: 864256. Throughput: 0: 957.9. Samples: 215278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:53:19,934][00191] Avg episode reward: [(0, '5.471')] [2024-09-29 15:53:24,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3703.5). Total num frames: 888832. Throughput: 0: 1016.9. Samples: 222512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:53:24,934][00191] Avg episode reward: [(0, '5.514')] [2024-09-29 15:53:28,043][05166] Updated weights for policy 0, policy_version 220 (0.0026) [2024-09-29 15:53:29,937][00191] Fps is (10 sec: 4093.8, 60 sec: 3959.1, 300 sec: 3694.7). Total num frames: 905216. Throughput: 0: 1013.4. Samples: 225124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:53:29,939][00191] Avg episode reward: [(0, '5.419')] [2024-09-29 15:53:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3702.8). Total num frames: 925696. Throughput: 0: 970.7. Samples: 230128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:53:34,938][00191] Avg episode reward: [(0, '5.799')] [2024-09-29 15:53:34,940][05153] Saving new best policy, reward=5.799! [2024-09-29 15:53:37,987][05166] Updated weights for policy 0, policy_version 230 (0.0040) [2024-09-29 15:53:39,931][00191] Fps is (10 sec: 4508.1, 60 sec: 4096.0, 300 sec: 3726.6). Total num frames: 950272. Throughput: 0: 1004.6. Samples: 237220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:53:39,933][00191] Avg episode reward: [(0, '5.829')] [2024-09-29 15:53:39,943][05153] Saving new best policy, reward=5.829! [2024-09-29 15:53:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3717.9). Total num frames: 966656. Throughput: 0: 1031.2. Samples: 240638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:53:44,934][00191] Avg episode reward: [(0, '5.453')] [2024-09-29 15:53:49,647][05166] Updated weights for policy 0, policy_version 240 (0.0020) [2024-09-29 15:53:49,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3709.6). Total num frames: 983040. Throughput: 0: 973.9. Samples: 244904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:53:49,934][00191] Avg episode reward: [(0, '5.692')] [2024-09-29 15:53:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3731.9). Total num frames: 1007616. Throughput: 0: 977.6. Samples: 251476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:53:54,935][00191] Avg episode reward: [(0, '5.735')] [2024-09-29 15:53:58,153][05166] Updated weights for policy 0, policy_version 250 (0.0027) [2024-09-29 15:53:59,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3738.5). Total num frames: 1028096. Throughput: 0: 1011.4. Samples: 255078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 15:53:59,936][00191] Avg episode reward: [(0, '5.589')] [2024-09-29 15:54:04,938][00191] Fps is (10 sec: 3684.0, 60 sec: 3959.0, 300 sec: 3730.2). Total num frames: 1044480. Throughput: 0: 1000.4. Samples: 260302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:54:04,942][00191] Avg episode reward: [(0, '5.789')] [2024-09-29 15:54:09,845][05166] Updated weights for policy 0, policy_version 260 (0.0031) [2024-09-29 15:54:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 961.2. Samples: 265768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:54:09,934][00191] Avg episode reward: [(0, '5.461')] [2024-09-29 15:54:14,931][00191] Fps is (10 sec: 4098.6, 60 sec: 4027.7, 300 sec: 3742.9). Total num frames: 1085440. Throughput: 0: 981.6. Samples: 269292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:54:14,935][00191] Avg episode reward: [(0, '5.681')] [2024-09-29 15:54:19,939][00191] Fps is (10 sec: 3683.6, 60 sec: 3959.0, 300 sec: 3734.9). Total num frames: 1101824. Throughput: 0: 1009.9. Samples: 275580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:54:19,943][00191] Avg episode reward: [(0, '5.758')] [2024-09-29 15:54:20,129][05166] Updated weights for policy 0, policy_version 270 (0.0021) [2024-09-29 15:54:24,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 954.3. Samples: 280162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:54:24,935][00191] Avg episode reward: [(0, '5.638')] [2024-09-29 15:54:29,836][05166] Updated weights for policy 0, policy_version 280 (0.0041) [2024-09-29 15:54:29,931][00191] Fps is (10 sec: 4509.0, 60 sec: 4028.1, 300 sec: 3887.7). Total num frames: 1146880. Throughput: 0: 959.1. Samples: 283796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:54:29,934][00191] Avg episode reward: [(0, '5.973')] [2024-09-29 15:54:29,944][05153] Saving new best policy, reward=5.973! [2024-09-29 15:54:34,934][00191] Fps is (10 sec: 4504.4, 60 sec: 4027.6, 300 sec: 3957.1). Total num frames: 1167360. Throughput: 0: 1023.5. Samples: 290964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:54:34,943][00191] Avg episode reward: [(0, '6.115')] [2024-09-29 15:54:34,947][05153] Saving new best policy, reward=6.115! [2024-09-29 15:54:39,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1179648. Throughput: 0: 976.6. Samples: 295422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:54:39,937][00191] Avg episode reward: [(0, '6.182')] [2024-09-29 15:54:39,948][05153] Saving new best policy, reward=6.182! [2024-09-29 15:54:41,406][05166] Updated weights for policy 0, policy_version 290 (0.0037) [2024-09-29 15:54:44,931][00191] Fps is (10 sec: 3277.6, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1200128. Throughput: 0: 958.8. Samples: 298226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:54:44,934][00191] Avg episode reward: [(0, '6.120')] [2024-09-29 15:54:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1224704. Throughput: 0: 998.4. Samples: 305222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:54:49,934][00191] Avg episode reward: [(0, '6.299')] [2024-09-29 15:54:49,945][05153] Saving new best policy, reward=6.299! [2024-09-29 15:54:50,229][05166] Updated weights for policy 0, policy_version 300 (0.0032) [2024-09-29 15:54:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1241088. Throughput: 0: 995.0. Samples: 310544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 15:54:54,936][00191] Avg episode reward: [(0, '6.475')] [2024-09-29 15:54:54,940][05153] Saving new best policy, reward=6.475! [2024-09-29 15:54:59,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1257472. Throughput: 0: 964.0. Samples: 312672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:54:59,937][00191] Avg episode reward: [(0, '6.604')] [2024-09-29 15:54:59,972][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth... [2024-09-29 15:55:00,104][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth [2024-09-29 15:55:00,121][05153] Saving new best policy, reward=6.604! [2024-09-29 15:55:02,014][05166] Updated weights for policy 0, policy_version 310 (0.0040) [2024-09-29 15:55:04,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3957.2). Total num frames: 1282048. Throughput: 0: 970.3. Samples: 319236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:55:04,934][00191] Avg episode reward: [(0, '6.852')] [2024-09-29 15:55:04,938][05153] Saving new best policy, reward=6.852! [2024-09-29 15:55:09,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1298432. Throughput: 0: 1008.4. Samples: 325540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 15:55:09,938][00191] Avg episode reward: [(0, '6.987')] [2024-09-29 15:55:09,952][05153] Saving new best policy, reward=6.987! [2024-09-29 15:55:14,931][00191] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1306624. Throughput: 0: 950.6. Samples: 326574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:55:14,934][00191] Avg episode reward: [(0, '7.160')] [2024-09-29 15:55:14,939][05153] Saving new best policy, reward=7.160! [2024-09-29 15:55:16,238][05166] Updated weights for policy 0, policy_version 320 (0.0031) [2024-09-29 15:55:19,933][00191] Fps is (10 sec: 2866.7, 60 sec: 3755.0, 300 sec: 3915.5). Total num frames: 1327104. Throughput: 0: 875.0. Samples: 330338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:55:19,943][00191] Avg episode reward: [(0, '7.535')] [2024-09-29 15:55:19,952][05153] Saving new best policy, reward=7.535! [2024-09-29 15:55:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3901.7). Total num frames: 1347584. Throughput: 0: 931.1. Samples: 337320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:55:24,935][00191] Avg episode reward: [(0, '7.648')] [2024-09-29 15:55:24,939][05153] Saving new best policy, reward=7.648! [2024-09-29 15:55:25,734][05166] Updated weights for policy 0, policy_version 330 (0.0033) [2024-09-29 15:55:29,931][00191] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3860.0). Total num frames: 1359872. Throughput: 0: 915.7. Samples: 339434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:55:29,934][00191] Avg episode reward: [(0, '7.365')] [2024-09-29 15:55:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3901.6). Total num frames: 1384448. Throughput: 0: 871.2. Samples: 344426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:55:34,934][00191] Avg episode reward: [(0, '7.321')] [2024-09-29 15:55:36,612][05166] Updated weights for policy 0, policy_version 340 (0.0021) [2024-09-29 15:55:39,931][00191] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 1404928. Throughput: 0: 910.3. Samples: 351506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:55:39,937][00191] Avg episode reward: [(0, '7.568')] [2024-09-29 15:55:44,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 1421312. Throughput: 0: 933.8. Samples: 354694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:55:44,936][00191] Avg episode reward: [(0, '7.919')] [2024-09-29 15:55:44,943][05153] Saving new best policy, reward=7.919! [2024-09-29 15:55:48,215][05166] Updated weights for policy 0, policy_version 350 (0.0025) [2024-09-29 15:55:49,931][00191] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3873.8). Total num frames: 1437696. Throughput: 0: 879.9. Samples: 358830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:55:49,933][00191] Avg episode reward: [(0, '8.108')] [2024-09-29 15:55:49,947][05153] Saving new best policy, reward=8.108! [2024-09-29 15:55:54,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1462272. Throughput: 0: 884.7. Samples: 365350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:55:54,936][00191] Avg episode reward: [(0, '8.057')] [2024-09-29 15:55:57,189][05166] Updated weights for policy 0, policy_version 360 (0.0031) [2024-09-29 15:55:59,931][00191] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 1482752. Throughput: 0: 940.2. Samples: 368884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:55:59,935][00191] Avg episode reward: [(0, '8.826')] [2024-09-29 15:55:59,946][05153] Saving new best policy, reward=8.826! [2024-09-29 15:56:04,932][00191] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3860.0). Total num frames: 1495040. Throughput: 0: 967.8. Samples: 373888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:56:04,938][00191] Avg episode reward: [(0, '8.433')] [2024-09-29 15:56:08,956][05166] Updated weights for policy 0, policy_version 370 (0.0022) [2024-09-29 15:56:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 1519616. Throughput: 0: 936.4. Samples: 379456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:56:09,933][00191] Avg episode reward: [(0, '9.402')] [2024-09-29 15:56:09,944][05153] Saving new best policy, reward=9.402! [2024-09-29 15:56:14,931][00191] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1540096. Throughput: 0: 967.4. Samples: 382968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 15:56:14,933][00191] Avg episode reward: [(0, '9.397')] [2024-09-29 15:56:18,908][05166] Updated weights for policy 0, policy_version 380 (0.0033) [2024-09-29 15:56:19,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 1556480. Throughput: 0: 991.0. Samples: 389022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:56:19,937][00191] Avg episode reward: [(0, '9.600')] [2024-09-29 15:56:19,949][05153] Saving new best policy, reward=9.600! [2024-09-29 15:56:24,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1572864. Throughput: 0: 935.0. Samples: 393580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:56:24,933][00191] Avg episode reward: [(0, '10.175')] [2024-09-29 15:56:24,940][05153] Saving new best policy, reward=10.175! [2024-09-29 15:56:29,472][05166] Updated weights for policy 0, policy_version 390 (0.0025) [2024-09-29 15:56:29,931][00191] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1597440. Throughput: 0: 941.1. Samples: 397042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:56:29,936][00191] Avg episode reward: [(0, '9.817')] [2024-09-29 15:56:34,931][00191] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1617920. Throughput: 0: 998.4. Samples: 403758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:56:34,938][00191] Avg episode reward: [(0, '10.396')] [2024-09-29 15:56:34,944][05153] Saving new best policy, reward=10.396! [2024-09-29 15:56:39,934][00191] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3859.9). Total num frames: 1630208. Throughput: 0: 948.7. Samples: 408042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 15:56:39,935][00191] Avg episode reward: [(0, '10.262')] [2024-09-29 15:56:41,263][05166] Updated weights for policy 0, policy_version 400 (0.0028) [2024-09-29 15:56:44,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1654784. Throughput: 0: 937.4. Samples: 411066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:56:44,938][00191] Avg episode reward: [(0, '10.518')] [2024-09-29 15:56:44,941][05153] Saving new best policy, reward=10.518! [2024-09-29 15:56:49,931][00191] Fps is (10 sec: 4506.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1675264. Throughput: 0: 980.2. Samples: 417998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:56:49,938][00191] Avg episode reward: [(0, '10.367')] [2024-09-29 15:56:50,064][05166] Updated weights for policy 0, policy_version 410 (0.0037) [2024-09-29 15:56:54,931][00191] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1691648. Throughput: 0: 974.0. Samples: 423286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:56:54,936][00191] Avg episode reward: [(0, '10.617')] [2024-09-29 15:56:54,939][05153] Saving new best policy, reward=10.617! [2024-09-29 15:56:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1712128. Throughput: 0: 944.6. Samples: 425476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:56:59,933][00191] Avg episode reward: [(0, '10.446')] [2024-09-29 15:56:59,940][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth... [2024-09-29 15:57:00,061][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth [2024-09-29 15:57:01,540][05166] Updated weights for policy 0, policy_version 420 (0.0028) [2024-09-29 15:57:04,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1732608. Throughput: 0: 964.4. Samples: 432418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:04,933][00191] Avg episode reward: [(0, '10.889')] [2024-09-29 15:57:04,936][05153] Saving new best policy, reward=10.889! [2024-09-29 15:57:09,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1753088. Throughput: 0: 997.7. Samples: 438476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:57:09,936][00191] Avg episode reward: [(0, '11.505')] [2024-09-29 15:57:09,949][05153] Saving new best policy, reward=11.505! [2024-09-29 15:57:12,491][05166] Updated weights for policy 0, policy_version 430 (0.0043) [2024-09-29 15:57:14,933][00191] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 1769472. Throughput: 0: 964.8. Samples: 440458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:57:14,935][00191] Avg episode reward: [(0, '12.250')] [2024-09-29 15:57:14,939][05153] Saving new best policy, reward=12.250! [2024-09-29 15:57:19,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1789952. Throughput: 0: 948.7. Samples: 446450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:19,938][00191] Avg episode reward: [(0, '13.159')] [2024-09-29 15:57:19,951][05153] Saving new best policy, reward=13.159! [2024-09-29 15:57:21,788][05166] Updated weights for policy 0, policy_version 440 (0.0025) [2024-09-29 15:57:24,936][00191] Fps is (10 sec: 4504.4, 60 sec: 4027.4, 300 sec: 3887.7). Total num frames: 1814528. Throughput: 0: 1012.0. Samples: 453582. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-29 15:57:24,941][00191] Avg episode reward: [(0, '13.509')] [2024-09-29 15:57:24,949][05153] Saving new best policy, reward=13.509! [2024-09-29 15:57:29,931][00191] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1826816. Throughput: 0: 991.1. Samples: 455666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:57:29,933][00191] Avg episode reward: [(0, '13.593')] [2024-09-29 15:57:29,941][05153] Saving new best policy, reward=13.593! [2024-09-29 15:57:33,288][05166] Updated weights for policy 0, policy_version 450 (0.0032) [2024-09-29 15:57:34,931][00191] Fps is (10 sec: 3278.3, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 1847296. Throughput: 0: 954.5. Samples: 460950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:57:34,934][00191] Avg episode reward: [(0, '12.714')] [2024-09-29 15:57:39,931][00191] Fps is (10 sec: 4505.7, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 1871872. Throughput: 0: 998.6. Samples: 468222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:39,937][00191] Avg episode reward: [(0, '12.889')] [2024-09-29 15:57:42,487][05166] Updated weights for policy 0, policy_version 460 (0.0038) [2024-09-29 15:57:44,934][00191] Fps is (10 sec: 4094.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1888256. Throughput: 0: 1018.5. Samples: 471312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:44,941][00191] Avg episode reward: [(0, '12.416')] [2024-09-29 15:57:49,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1908736. Throughput: 0: 958.3. Samples: 475540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:49,934][00191] Avg episode reward: [(0, '12.715')] [2024-09-29 15:57:53,377][05166] Updated weights for policy 0, policy_version 470 (0.0028) [2024-09-29 15:57:54,931][00191] Fps is (10 sec: 4097.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1929216. Throughput: 0: 982.1. Samples: 482670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:54,938][00191] Avg episode reward: [(0, '13.395')] [2024-09-29 15:57:59,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1949696. Throughput: 0: 1017.8. Samples: 486256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:57:59,934][00191] Avg episode reward: [(0, '13.971')] [2024-09-29 15:58:00,042][05153] Saving new best policy, reward=13.971! [2024-09-29 15:58:04,357][05166] Updated weights for policy 0, policy_version 480 (0.0034) [2024-09-29 15:58:04,932][00191] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1966080. Throughput: 0: 990.7. Samples: 491032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:58:04,935][00191] Avg episode reward: [(0, '14.377')] [2024-09-29 15:58:04,939][05153] Saving new best policy, reward=14.377! [2024-09-29 15:58:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1986560. Throughput: 0: 969.8. Samples: 497218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:09,939][00191] Avg episode reward: [(0, '15.001')] [2024-09-29 15:58:09,983][05153] Saving new best policy, reward=15.001! [2024-09-29 15:58:13,410][05166] Updated weights for policy 0, policy_version 490 (0.0021) [2024-09-29 15:58:14,931][00191] Fps is (10 sec: 4505.9, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 2011136. Throughput: 0: 999.5. Samples: 500644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:14,938][00191] Avg episode reward: [(0, '15.733')] [2024-09-29 15:58:14,943][05153] Saving new best policy, reward=15.733! [2024-09-29 15:58:19,933][00191] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 2027520. Throughput: 0: 1009.1. Samples: 506360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:19,936][00191] Avg episode reward: [(0, '15.831')] [2024-09-29 15:58:19,953][05153] Saving new best policy, reward=15.831! [2024-09-29 15:58:24,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3860.0). Total num frames: 2043904. Throughput: 0: 959.2. Samples: 511388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:24,939][00191] Avg episode reward: [(0, '17.348')] [2024-09-29 15:58:24,941][05153] Saving new best policy, reward=17.348! [2024-09-29 15:58:25,221][05166] Updated weights for policy 0, policy_version 500 (0.0027) [2024-09-29 15:58:29,931][00191] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2068480. Throughput: 0: 967.8. Samples: 514862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:29,934][00191] Avg episode reward: [(0, '18.564')] [2024-09-29 15:58:29,944][05153] Saving new best policy, reward=18.564! [2024-09-29 15:58:34,364][05166] Updated weights for policy 0, policy_version 510 (0.0034) [2024-09-29 15:58:34,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2088960. Throughput: 0: 1025.9. Samples: 521706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:58:34,934][00191] Avg episode reward: [(0, '18.982')] [2024-09-29 15:58:34,935][05153] Saving new best policy, reward=18.982! [2024-09-29 15:58:39,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2105344. Throughput: 0: 963.6. Samples: 526030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:39,936][00191] Avg episode reward: [(0, '19.105')] [2024-09-29 15:58:39,945][05153] Saving new best policy, reward=19.105! [2024-09-29 15:58:44,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 2125824. Throughput: 0: 958.3. Samples: 529380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:58:44,935][00191] Avg episode reward: [(0, '20.089')] [2024-09-29 15:58:44,940][05153] Saving new best policy, reward=20.089! [2024-09-29 15:58:45,182][05166] Updated weights for policy 0, policy_version 520 (0.0030) [2024-09-29 15:58:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2150400. Throughput: 0: 1005.3. Samples: 536268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:58:49,933][00191] Avg episode reward: [(0, '18.359')] [2024-09-29 15:58:54,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2162688. Throughput: 0: 978.2. Samples: 541236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:58:54,936][00191] Avg episode reward: [(0, '18.787')] [2024-09-29 15:58:56,478][05166] Updated weights for policy 0, policy_version 530 (0.0039) [2024-09-29 15:58:59,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2183168. Throughput: 0: 960.3. Samples: 543858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:58:59,940][00191] Avg episode reward: [(0, '19.080')] [2024-09-29 15:58:59,960][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000534_2187264.pth... [2024-09-29 15:59:00,096][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth [2024-09-29 15:59:04,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3873.8). Total num frames: 2207744. Throughput: 0: 990.6. Samples: 550936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:59:04,939][00191] Avg episode reward: [(0, '19.559')] [2024-09-29 15:59:05,248][05166] Updated weights for policy 0, policy_version 540 (0.0035) [2024-09-29 15:59:09,931][00191] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2224128. Throughput: 0: 1005.6. Samples: 556640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:59:09,934][00191] Avg episode reward: [(0, '19.132')] [2024-09-29 15:59:14,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.1). Total num frames: 2240512. Throughput: 0: 976.6. Samples: 558810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:59:14,934][00191] Avg episode reward: [(0, '19.595')] [2024-09-29 15:59:16,566][05166] Updated weights for policy 0, policy_version 550 (0.0014) [2024-09-29 15:59:19,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 2265088. Throughput: 0: 968.7. Samples: 565298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:59:19,936][00191] Avg episode reward: [(0, '21.026')] [2024-09-29 15:59:19,947][05153] Saving new best policy, reward=21.026! [2024-09-29 15:59:24,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2285568. Throughput: 0: 1023.9. Samples: 572106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:59:24,934][00191] Avg episode reward: [(0, '19.383')] [2024-09-29 15:59:26,559][05166] Updated weights for policy 0, policy_version 560 (0.0050) [2024-09-29 15:59:29,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2301952. Throughput: 0: 996.9. Samples: 574240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:59:29,935][00191] Avg episode reward: [(0, '19.259')] [2024-09-29 15:59:34,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2326528. Throughput: 0: 973.8. Samples: 580090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:59:34,933][00191] Avg episode reward: [(0, '18.192')] [2024-09-29 15:59:36,556][05166] Updated weights for policy 0, policy_version 570 (0.0022) [2024-09-29 15:59:39,931][00191] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2351104. Throughput: 0: 1024.2. Samples: 587326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 15:59:39,934][00191] Avg episode reward: [(0, '18.463')] [2024-09-29 15:59:44,939][00191] Fps is (10 sec: 3683.6, 60 sec: 3959.0, 300 sec: 3859.9). Total num frames: 2363392. Throughput: 0: 1026.1. Samples: 590042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:59:44,945][00191] Avg episode reward: [(0, '18.205')] [2024-09-29 15:59:47,740][05166] Updated weights for policy 0, policy_version 580 (0.0026) [2024-09-29 15:59:49,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2383872. Throughput: 0: 973.3. Samples: 594736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 15:59:49,933][00191] Avg episode reward: [(0, '18.728')] [2024-09-29 15:59:54,931][00191] Fps is (10 sec: 4509.0, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2408448. Throughput: 0: 1007.5. Samples: 601976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 15:59:54,934][00191] Avg episode reward: [(0, '20.152')] [2024-09-29 15:59:56,201][05166] Updated weights for policy 0, policy_version 590 (0.0017) [2024-09-29 15:59:59,933][00191] Fps is (10 sec: 4504.6, 60 sec: 4095.9, 300 sec: 3887.7). Total num frames: 2428928. Throughput: 0: 1040.5. Samples: 605636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 15:59:59,936][00191] Avg episode reward: [(0, '20.679')] [2024-09-29 16:00:04,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2445312. Throughput: 0: 992.9. Samples: 609980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:00:04,934][00191] Avg episode reward: [(0, '20.194')] [2024-09-29 16:00:07,486][05166] Updated weights for policy 0, policy_version 600 (0.0065) [2024-09-29 16:00:09,931][00191] Fps is (10 sec: 3687.2, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2465792. Throughput: 0: 992.6. Samples: 616772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:00:09,933][00191] Avg episode reward: [(0, '20.440')] [2024-09-29 16:00:14,937][00191] Fps is (10 sec: 4503.0, 60 sec: 4163.9, 300 sec: 3943.2). Total num frames: 2490368. Throughput: 0: 1024.6. Samples: 620354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:00:14,942][00191] Avg episode reward: [(0, '20.656')] [2024-09-29 16:00:17,073][05166] Updated weights for policy 0, policy_version 610 (0.0015) [2024-09-29 16:00:19,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2502656. Throughput: 0: 1015.6. Samples: 625794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:00:19,941][00191] Avg episode reward: [(0, '21.101')] [2024-09-29 16:00:19,975][05153] Saving new best policy, reward=21.101! [2024-09-29 16:00:24,934][00191] Fps is (10 sec: 2868.1, 60 sec: 3891.0, 300 sec: 3929.3). Total num frames: 2519040. Throughput: 0: 943.5. Samples: 629788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:00:24,938][00191] Avg episode reward: [(0, '20.314')] [2024-09-29 16:00:28,694][05166] Updated weights for policy 0, policy_version 620 (0.0037) [2024-09-29 16:00:29,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2543616. Throughput: 0: 962.7. Samples: 633358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:00:29,933][00191] Avg episode reward: [(0, '19.434')] [2024-09-29 16:00:34,931][00191] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2560000. Throughput: 0: 1002.5. Samples: 639848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:00:34,935][00191] Avg episode reward: [(0, '20.418')] [2024-09-29 16:00:39,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2576384. Throughput: 0: 948.7. Samples: 644666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:00:39,933][00191] Avg episode reward: [(0, '19.475')] [2024-09-29 16:00:39,959][05166] Updated weights for policy 0, policy_version 630 (0.0039) [2024-09-29 16:00:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3960.0, 300 sec: 3943.3). Total num frames: 2600960. Throughput: 0: 945.5. Samples: 648182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:00:44,933][00191] Avg episode reward: [(0, '19.752')] [2024-09-29 16:00:48,490][05166] Updated weights for policy 0, policy_version 640 (0.0053) [2024-09-29 16:00:49,932][00191] Fps is (10 sec: 4914.9, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2625536. Throughput: 0: 1009.8. Samples: 655422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:00:49,937][00191] Avg episode reward: [(0, '20.897')] [2024-09-29 16:00:54,933][00191] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3915.5). Total num frames: 2637824. Throughput: 0: 959.0. Samples: 659928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:00:54,935][00191] Avg episode reward: [(0, '20.980')] [2024-09-29 16:00:59,785][05166] Updated weights for policy 0, policy_version 650 (0.0041) [2024-09-29 16:00:59,931][00191] Fps is (10 sec: 3686.7, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 2662400. Throughput: 0: 945.9. Samples: 662916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:00:59,938][00191] Avg episode reward: [(0, '20.267')] [2024-09-29 16:00:59,950][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000650_2662400.pth... [2024-09-29 16:01:00,074][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth [2024-09-29 16:01:04,931][00191] Fps is (10 sec: 4506.3, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2682880. Throughput: 0: 980.6. Samples: 669920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:01:04,937][00191] Avg episode reward: [(0, '21.667')] [2024-09-29 16:01:04,942][05153] Saving new best policy, reward=21.667! [2024-09-29 16:01:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2699264. Throughput: 0: 1007.2. Samples: 675108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:01:09,934][00191] Avg episode reward: [(0, '22.730')] [2024-09-29 16:01:09,949][05153] Saving new best policy, reward=22.730! [2024-09-29 16:01:10,851][05166] Updated weights for policy 0, policy_version 660 (0.0023) [2024-09-29 16:01:14,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3943.3). Total num frames: 2719744. Throughput: 0: 974.7. Samples: 677220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:01:14,937][00191] Avg episode reward: [(0, '23.875')] [2024-09-29 16:01:14,939][05153] Saving new best policy, reward=23.875! [2024-09-29 16:01:19,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2740224. Throughput: 0: 986.9. Samples: 684260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:01:19,938][00191] Avg episode reward: [(0, '24.704')] [2024-09-29 16:01:19,956][05153] Saving new best policy, reward=24.704! [2024-09-29 16:01:20,196][05166] Updated weights for policy 0, policy_version 670 (0.0035) [2024-09-29 16:01:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3943.3). Total num frames: 2760704. Throughput: 0: 1015.2. Samples: 690348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:01:24,937][00191] Avg episode reward: [(0, '24.417')] [2024-09-29 16:01:29,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2772992. Throughput: 0: 982.0. Samples: 692372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:01:29,934][00191] Avg episode reward: [(0, '24.873')] [2024-09-29 16:01:29,945][05153] Saving new best policy, reward=24.873! [2024-09-29 16:01:32,042][05166] Updated weights for policy 0, policy_version 680 (0.0024) [2024-09-29 16:01:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2797568. Throughput: 0: 947.6. Samples: 698064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:01:34,934][00191] Avg episode reward: [(0, '21.637')] [2024-09-29 16:01:39,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2818048. Throughput: 0: 1009.4. Samples: 705348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:01:39,936][00191] Avg episode reward: [(0, '20.137')] [2024-09-29 16:01:41,455][05166] Updated weights for policy 0, policy_version 690 (0.0024) [2024-09-29 16:01:44,931][00191] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2834432. Throughput: 0: 991.3. Samples: 707524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:01:44,934][00191] Avg episode reward: [(0, '19.924')] [2024-09-29 16:01:49,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 2854912. Throughput: 0: 956.0. Samples: 712938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:01:49,938][00191] Avg episode reward: [(0, '19.015')] [2024-09-29 16:01:51,811][05166] Updated weights for policy 0, policy_version 700 (0.0027) [2024-09-29 16:01:54,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3957.1). Total num frames: 2879488. Throughput: 0: 998.7. Samples: 720052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:01:54,938][00191] Avg episode reward: [(0, '18.696')] [2024-09-29 16:01:59,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2895872. Throughput: 0: 1021.3. Samples: 723178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 16:01:59,938][00191] Avg episode reward: [(0, '18.821')] [2024-09-29 16:02:03,008][05166] Updated weights for policy 0, policy_version 710 (0.0015) [2024-09-29 16:02:04,931][00191] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2916352. Throughput: 0: 959.7. Samples: 727448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 16:02:04,933][00191] Avg episode reward: [(0, '19.998')] [2024-09-29 16:02:09,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2936832. Throughput: 0: 980.2. Samples: 734458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:02:09,934][00191] Avg episode reward: [(0, '20.637')] [2024-09-29 16:02:11,968][05166] Updated weights for policy 0, policy_version 720 (0.0030) [2024-09-29 16:02:14,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2957312. Throughput: 0: 1009.4. Samples: 737796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:02:14,935][00191] Avg episode reward: [(0, '22.813')] [2024-09-29 16:02:19,935][00191] Fps is (10 sec: 3275.7, 60 sec: 3822.7, 300 sec: 3915.5). Total num frames: 2969600. Throughput: 0: 987.0. Samples: 742482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 16:02:19,938][00191] Avg episode reward: [(0, '22.809')] [2024-09-29 16:02:23,534][05166] Updated weights for policy 0, policy_version 730 (0.0039) [2024-09-29 16:02:24,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2994176. Throughput: 0: 961.0. Samples: 748594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:02:24,935][00191] Avg episode reward: [(0, '23.452')] [2024-09-29 16:02:29,931][00191] Fps is (10 sec: 4916.8, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3018752. Throughput: 0: 991.3. Samples: 752134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:02:29,934][00191] Avg episode reward: [(0, '21.245')] [2024-09-29 16:02:33,493][05166] Updated weights for policy 0, policy_version 740 (0.0045) [2024-09-29 16:02:34,931][00191] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3035136. Throughput: 0: 1002.0. Samples: 758030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:02:34,939][00191] Avg episode reward: [(0, '20.993')] [2024-09-29 16:02:39,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3051520. Throughput: 0: 961.6. Samples: 763322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:02:39,937][00191] Avg episode reward: [(0, '19.853')] [2024-09-29 16:02:43,513][05166] Updated weights for policy 0, policy_version 750 (0.0033) [2024-09-29 16:02:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3076096. Throughput: 0: 970.5. Samples: 766852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:02:44,937][00191] Avg episode reward: [(0, '21.536')] [2024-09-29 16:02:49,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3096576. Throughput: 0: 1026.0. Samples: 773620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:02:49,936][00191] Avg episode reward: [(0, '22.186')] [2024-09-29 16:02:54,905][05166] Updated weights for policy 0, policy_version 760 (0.0018) [2024-09-29 16:02:54,931][00191] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3112960. Throughput: 0: 964.0. Samples: 777838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:02:54,940][00191] Avg episode reward: [(0, '23.164')] [2024-09-29 16:02:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3133440. Throughput: 0: 967.6. Samples: 781338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:02:59,934][00191] Avg episode reward: [(0, '22.492')] [2024-09-29 16:02:59,974][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth... [2024-09-29 16:03:00,096][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000534_2187264.pth [2024-09-29 16:03:03,472][05166] Updated weights for policy 0, policy_version 770 (0.0021) [2024-09-29 16:03:04,933][00191] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3971.0). Total num frames: 3158016. Throughput: 0: 1021.6. Samples: 788452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:03:04,935][00191] Avg episode reward: [(0, '23.943')] [2024-09-29 16:03:09,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3170304. Throughput: 0: 994.7. Samples: 793354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:03:09,940][00191] Avg episode reward: [(0, '22.023')] [2024-09-29 16:03:14,764][05166] Updated weights for policy 0, policy_version 780 (0.0045) [2024-09-29 16:03:14,931][00191] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3194880. Throughput: 0: 972.9. Samples: 795914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:03:14,933][00191] Avg episode reward: [(0, '20.818')] [2024-09-29 16:03:19,931][00191] Fps is (10 sec: 4915.3, 60 sec: 4164.5, 300 sec: 3984.9). Total num frames: 3219456. Throughput: 0: 1003.2. Samples: 803172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 16:03:19,933][00191] Avg episode reward: [(0, '21.393')] [2024-09-29 16:03:24,773][05166] Updated weights for policy 0, policy_version 790 (0.0025) [2024-09-29 16:03:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3235840. Throughput: 0: 1012.2. Samples: 808870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:03:24,935][00191] Avg episode reward: [(0, '21.795')] [2024-09-29 16:03:29,931][00191] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3252224. Throughput: 0: 981.3. Samples: 811010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 16:03:29,938][00191] Avg episode reward: [(0, '22.471')] [2024-09-29 16:03:34,783][05166] Updated weights for policy 0, policy_version 800 (0.0035) [2024-09-29 16:03:34,931][00191] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3276800. Throughput: 0: 978.3. Samples: 817642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:03:34,939][00191] Avg episode reward: [(0, '21.974')] [2024-09-29 16:03:39,933][00191] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 3297280. Throughput: 0: 1037.9. Samples: 824544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:03:39,935][00191] Avg episode reward: [(0, '22.485')] [2024-09-29 16:03:44,931][00191] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3309568. Throughput: 0: 1006.9. Samples: 826650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:03:44,934][00191] Avg episode reward: [(0, '22.544')] [2024-09-29 16:03:46,222][05166] Updated weights for policy 0, policy_version 810 (0.0049) [2024-09-29 16:03:49,931][00191] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3334144. Throughput: 0: 969.7. Samples: 832088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:03:49,934][00191] Avg episode reward: [(0, '23.169')] [2024-09-29 16:03:54,922][05166] Updated weights for policy 0, policy_version 820 (0.0043) [2024-09-29 16:03:54,931][00191] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3358720. Throughput: 0: 1018.5. Samples: 839186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:03:54,937][00191] Avg episode reward: [(0, '22.466')] [2024-09-29 16:03:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3371008. Throughput: 0: 1027.9. Samples: 842170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:03:59,937][00191] Avg episode reward: [(0, '23.727')] [2024-09-29 16:04:04,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 3391488. Throughput: 0: 964.8. Samples: 846586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:04:04,940][00191] Avg episode reward: [(0, '23.965')] [2024-09-29 16:04:06,303][05166] Updated weights for policy 0, policy_version 830 (0.0048) [2024-09-29 16:04:09,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3416064. Throughput: 0: 996.2. Samples: 853698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:04:09,934][00191] Avg episode reward: [(0, '25.442')] [2024-09-29 16:04:09,945][05153] Saving new best policy, reward=25.442! [2024-09-29 16:04:14,933][00191] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3432448. Throughput: 0: 1027.3. Samples: 857238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:04:14,940][00191] Avg episode reward: [(0, '25.055')] [2024-09-29 16:04:16,566][05166] Updated weights for policy 0, policy_version 840 (0.0026) [2024-09-29 16:04:19,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3448832. Throughput: 0: 980.5. Samples: 861766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:04:19,938][00191] Avg episode reward: [(0, '24.460')] [2024-09-29 16:04:24,931][00191] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3473408. Throughput: 0: 961.1. Samples: 867792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:04:24,938][00191] Avg episode reward: [(0, '21.985')] [2024-09-29 16:04:26,632][05166] Updated weights for policy 0, policy_version 850 (0.0032) [2024-09-29 16:04:29,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3493888. Throughput: 0: 992.6. Samples: 871318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:04:29,935][00191] Avg episode reward: [(0, '20.985')] [2024-09-29 16:04:34,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3510272. Throughput: 0: 999.3. Samples: 877056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:04:34,935][00191] Avg episode reward: [(0, '19.961')] [2024-09-29 16:04:37,982][05166] Updated weights for policy 0, policy_version 860 (0.0025) [2024-09-29 16:04:39,931][00191] Fps is (10 sec: 3686.3, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 3530752. Throughput: 0: 958.9. Samples: 882336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:04:39,938][00191] Avg episode reward: [(0, '19.138')] [2024-09-29 16:04:44,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3551232. Throughput: 0: 969.7. Samples: 885806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:04:44,934][00191] Avg episode reward: [(0, '19.395')] [2024-09-29 16:04:46,889][05166] Updated weights for policy 0, policy_version 870 (0.0032) [2024-09-29 16:04:49,931][00191] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3571712. Throughput: 0: 1018.7. Samples: 892428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 16:04:49,935][00191] Avg episode reward: [(0, '19.286')] [2024-09-29 16:04:54,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3584000. Throughput: 0: 953.9. Samples: 896624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:04:54,933][00191] Avg episode reward: [(0, '19.727')] [2024-09-29 16:04:58,301][05166] Updated weights for policy 0, policy_version 880 (0.0046) [2024-09-29 16:04:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3608576. Throughput: 0: 954.2. Samples: 900174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:04:59,934][00191] Avg episode reward: [(0, '21.502')] [2024-09-29 16:04:59,947][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000881_3608576.pth... [2024-09-29 16:05:00,096][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000650_2662400.pth [2024-09-29 16:05:04,931][00191] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3633152. Throughput: 0: 1005.3. Samples: 907006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:05:04,937][00191] Avg episode reward: [(0, '21.333')] [2024-09-29 16:05:09,332][05166] Updated weights for policy 0, policy_version 890 (0.0030) [2024-09-29 16:05:09,934][00191] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3915.5). Total num frames: 3645440. Throughput: 0: 977.1. Samples: 911762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:05:09,940][00191] Avg episode reward: [(0, '22.647')] [2024-09-29 16:05:14,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 3665920. Throughput: 0: 957.5. Samples: 914404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 16:05:14,936][00191] Avg episode reward: [(0, '24.271')] [2024-09-29 16:05:18,489][05166] Updated weights for policy 0, policy_version 900 (0.0033) [2024-09-29 16:05:19,935][00191] Fps is (10 sec: 4505.1, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 3690496. Throughput: 0: 988.6. Samples: 921546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:05:19,938][00191] Avg episode reward: [(0, '24.812')] [2024-09-29 16:05:24,933][00191] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3943.2). Total num frames: 3706880. Throughput: 0: 997.6. Samples: 927228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:05:24,943][00191] Avg episode reward: [(0, '24.405')] [2024-09-29 16:05:29,931][00191] Fps is (10 sec: 3278.0, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3723264. Throughput: 0: 968.0. Samples: 929366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-29 16:05:29,940][00191] Avg episode reward: [(0, '23.616')] [2024-09-29 16:05:30,181][05166] Updated weights for policy 0, policy_version 910 (0.0043) [2024-09-29 16:05:34,931][00191] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3747840. Throughput: 0: 967.6. Samples: 935970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 16:05:34,937][00191] Avg episode reward: [(0, '24.092')] [2024-09-29 16:05:38,814][05166] Updated weights for policy 0, policy_version 920 (0.0013) [2024-09-29 16:05:39,938][00191] Fps is (10 sec: 4502.6, 60 sec: 3959.1, 300 sec: 3957.1). Total num frames: 3768320. Throughput: 0: 1026.1. Samples: 942806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 16:05:39,942][00191] Avg episode reward: [(0, '22.505')] [2024-09-29 16:05:44,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3784704. Throughput: 0: 994.2. Samples: 944914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:05:44,937][00191] Avg episode reward: [(0, '21.239')] [2024-09-29 16:05:49,695][05166] Updated weights for policy 0, policy_version 930 (0.0022) [2024-09-29 16:05:49,931][00191] Fps is (10 sec: 4098.7, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 3809280. Throughput: 0: 971.0. Samples: 950700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:05:49,934][00191] Avg episode reward: [(0, '23.528')] [2024-09-29 16:05:54,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3829760. Throughput: 0: 1021.3. Samples: 957720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-29 16:05:54,934][00191] Avg episode reward: [(0, '22.984')] [2024-09-29 16:05:59,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3846144. Throughput: 0: 1021.0. Samples: 960350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:05:59,933][00191] Avg episode reward: [(0, '21.638')] [2024-09-29 16:06:00,784][05166] Updated weights for policy 0, policy_version 940 (0.0035) [2024-09-29 16:06:04,931][00191] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3866624. Throughput: 0: 967.6. Samples: 965086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:06:04,936][00191] Avg episode reward: [(0, '23.247')] [2024-09-29 16:06:09,827][05166] Updated weights for policy 0, policy_version 950 (0.0021) [2024-09-29 16:06:09,931][00191] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 3971.0). Total num frames: 3891200. Throughput: 0: 1001.0. Samples: 972270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:06:09,934][00191] Avg episode reward: [(0, '23.005')] [2024-09-29 16:06:14,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3907584. Throughput: 0: 1033.2. Samples: 975860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-29 16:06:14,934][00191] Avg episode reward: [(0, '23.563')] [2024-09-29 16:06:19,931][00191] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3943.3). Total num frames: 3923968. Throughput: 0: 985.6. Samples: 980324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-29 16:06:19,933][00191] Avg episode reward: [(0, '23.896')] [2024-09-29 16:06:21,159][05166] Updated weights for policy 0, policy_version 960 (0.0040) [2024-09-29 16:06:24,931][00191] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 3948544. Throughput: 0: 980.8. Samples: 986934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-29 16:06:24,934][00191] Avg episode reward: [(0, '26.992')] [2024-09-29 16:06:24,939][05153] Saving new best policy, reward=26.992! [2024-09-29 16:06:29,931][00191] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 3969024. Throughput: 0: 1012.5. Samples: 990476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-29 16:06:29,937][00191] Avg episode reward: [(0, '25.832')] [2024-09-29 16:06:30,212][05166] Updated weights for policy 0, policy_version 970 (0.0030) [2024-09-29 16:06:34,933][00191] Fps is (10 sec: 3685.6, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 3985408. Throughput: 0: 998.5. Samples: 995634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-29 16:06:34,936][00191] Avg episode reward: [(0, '25.356')] [2024-09-29 16:06:39,144][05153] Stopping Batcher_0... [2024-09-29 16:06:39,146][05153] Loop batcher_evt_loop terminating... [2024-09-29 16:06:39,147][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 16:06:39,155][00191] Component Batcher_0 stopped! [2024-09-29 16:06:39,227][05166] Weights refcount: 2 0 [2024-09-29 16:06:39,242][05166] Stopping InferenceWorker_p0-w0... [2024-09-29 16:06:39,243][05166] Loop inference_proc0-0_evt_loop terminating... [2024-09-29 16:06:39,247][00191] Component InferenceWorker_p0-w0 stopped! [2024-09-29 16:06:39,296][05153] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth [2024-09-29 16:06:39,312][05153] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 16:06:39,491][05153] Stopping LearnerWorker_p0... [2024-09-29 16:06:39,498][05153] Loop learner_proc0_evt_loop terminating... [2024-09-29 16:06:39,494][00191] Component LearnerWorker_p0 stopped! [2024-09-29 16:06:39,544][00191] Component RolloutWorker_w1 stopped! [2024-09-29 16:06:39,550][05167] Stopping RolloutWorker_w1... [2024-09-29 16:06:39,551][05167] Loop rollout_proc1_evt_loop terminating... [2024-09-29 16:06:39,576][00191] Component RolloutWorker_w5 stopped! [2024-09-29 16:06:39,581][05172] Stopping RolloutWorker_w5... [2024-09-29 16:06:39,582][05172] Loop rollout_proc5_evt_loop terminating... [2024-09-29 16:06:39,588][00191] Component RolloutWorker_w3 stopped! [2024-09-29 16:06:39,592][05169] Stopping RolloutWorker_w3... [2024-09-29 16:06:39,593][05169] Loop rollout_proc3_evt_loop terminating... [2024-09-29 16:06:39,602][00191] Component RolloutWorker_w7 stopped! [2024-09-29 16:06:39,606][05173] Stopping RolloutWorker_w7... [2024-09-29 16:06:39,606][05173] Loop rollout_proc7_evt_loop terminating... [2024-09-29 16:06:39,729][05170] Stopping RolloutWorker_w2... [2024-09-29 16:06:39,728][00191] Component RolloutWorker_w2 stopped! [2024-09-29 16:06:39,737][05170] Loop rollout_proc2_evt_loop terminating... [2024-09-29 16:06:39,739][05171] Stopping RolloutWorker_w4... [2024-09-29 16:06:39,738][00191] Component RolloutWorker_w4 stopped! [2024-09-29 16:06:39,746][05171] Loop rollout_proc4_evt_loop terminating... [2024-09-29 16:06:39,763][00191] Component RolloutWorker_w0 stopped! [2024-09-29 16:06:39,771][00191] Component RolloutWorker_w6 stopped! [2024-09-29 16:06:39,771][05174] Stopping RolloutWorker_w6... [2024-09-29 16:06:39,763][05168] Stopping RolloutWorker_w0... [2024-09-29 16:06:39,772][00191] Waiting for process learner_proc0 to stop... [2024-09-29 16:06:39,781][05168] Loop rollout_proc0_evt_loop terminating... [2024-09-29 16:06:39,773][05174] Loop rollout_proc6_evt_loop terminating... [2024-09-29 16:06:40,971][00191] Waiting for process inference_proc0-0 to join... [2024-09-29 16:06:40,973][00191] Waiting for process rollout_proc0 to join... [2024-09-29 16:06:43,003][00191] Waiting for process rollout_proc1 to join... [2024-09-29 16:06:43,004][00191] Waiting for process rollout_proc2 to join... [2024-09-29 16:06:43,011][00191] Waiting for process rollout_proc3 to join... [2024-09-29 16:06:43,012][00191] Waiting for process rollout_proc4 to join... [2024-09-29 16:06:43,015][00191] Waiting for process rollout_proc5 to join... [2024-09-29 16:06:43,016][00191] Waiting for process rollout_proc6 to join... [2024-09-29 16:06:43,018][00191] Waiting for process rollout_proc7 to join... [2024-09-29 16:06:43,019][00191] Batcher 0 profile tree view: batching: 28.2059, releasing_batches: 0.0396 [2024-09-29 16:06:43,021][00191] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0024 wait_policy_total: 388.7123 update_model: 9.1986 weight_update: 0.0034 one_step: 0.0052 handle_policy_step: 592.6184 deserialize: 14.7357, stack: 3.1584, obs_to_device_normalize: 121.2650, forward: 313.3477, send_messages: 27.6779 prepare_outputs: 82.7950 to_cpu: 47.5294 [2024-09-29 16:06:43,022][00191] Learner 0 profile tree view: misc: 0.0064, prepare_batch: 13.4630 train: 74.9971 epoch_init: 0.0085, minibatch_init: 0.0064, losses_postprocess: 0.6486, kl_divergence: 0.6932, after_optimizer: 33.6219 calculate_losses: 27.1642 losses_init: 0.0039, forward_head: 1.3007, bptt_initial: 18.4265, tail: 1.0064, advantages_returns: 0.2818, losses: 3.8878 bptt: 1.9420 bptt_forward_core: 1.8378 update: 12.1432 clip: 0.9211 [2024-09-29 16:06:43,024][00191] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3493, enqueue_policy_requests: 88.6628, env_step: 811.2053, overhead: 12.0389, complete_rollouts: 7.0654 save_policy_outputs: 19.0850 split_output_tensors: 7.6624 [2024-09-29 16:06:43,025][00191] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2917, enqueue_policy_requests: 93.0825, env_step: 801.9541, overhead: 12.2341, complete_rollouts: 6.6274 save_policy_outputs: 19.1569 split_output_tensors: 7.5929 [2024-09-29 16:06:43,026][00191] Loop Runner_EvtLoop terminating... [2024-09-29 16:06:43,028][00191] Runner profile tree view: main_loop: 1057.6423 [2024-09-29 16:06:43,029][00191] Collected {0: 4005888}, FPS: 3787.6 [2024-09-29 16:06:48,817][00191] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-29 16:06:48,819][00191] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-29 16:06:48,821][00191] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-29 16:06:48,824][00191] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-29 16:06:48,825][00191] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-29 16:06:48,827][00191] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-29 16:06:48,829][00191] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-29 16:06:48,830][00191] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-29 16:06:48,831][00191] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-29 16:06:48,832][00191] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-29 16:06:48,834][00191] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-29 16:06:48,835][00191] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-29 16:06:48,836][00191] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-29 16:06:48,837][00191] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-29 16:06:48,838][00191] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-29 16:06:48,872][00191] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-29 16:06:48,875][00191] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 16:06:48,877][00191] RunningMeanStd input shape: (1,) [2024-09-29 16:06:48,894][00191] ConvEncoder: input_channels=3 [2024-09-29 16:06:48,996][00191] Conv encoder output size: 512 [2024-09-29 16:06:48,997][00191] Policy head output size: 512 [2024-09-29 16:06:49,269][00191] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 16:06:50,090][00191] Num frames 100... [2024-09-29 16:06:50,213][00191] Num frames 200... [2024-09-29 16:06:50,334][00191] Num frames 300... [2024-09-29 16:06:50,457][00191] Num frames 400... [2024-09-29 16:06:50,577][00191] Avg episode rewards: #0: 8.480, true rewards: #0: 4.480 [2024-09-29 16:06:50,579][00191] Avg episode reward: 8.480, avg true_objective: 4.480 [2024-09-29 16:06:50,644][00191] Num frames 500... [2024-09-29 16:06:50,771][00191] Num frames 600... [2024-09-29 16:06:50,890][00191] Num frames 700... [2024-09-29 16:06:51,011][00191] Num frames 800... [2024-09-29 16:06:51,133][00191] Num frames 900... [2024-09-29 16:06:51,255][00191] Num frames 1000... [2024-09-29 16:06:51,381][00191] Num frames 1100... [2024-09-29 16:06:51,506][00191] Num frames 1200... [2024-09-29 16:06:51,642][00191] Num frames 1300... [2024-09-29 16:06:51,768][00191] Num frames 1400... [2024-09-29 16:06:51,888][00191] Num frames 1500... [2024-09-29 16:06:52,007][00191] Num frames 1600... [2024-09-29 16:06:52,139][00191] Num frames 1700... [2024-09-29 16:06:52,282][00191] Num frames 1800... [2024-09-29 16:06:52,405][00191] Num frames 1900... [2024-09-29 16:06:52,558][00191] Avg episode rewards: #0: 22.395, true rewards: #0: 9.895 [2024-09-29 16:06:52,559][00191] Avg episode reward: 22.395, avg true_objective: 9.895 [2024-09-29 16:06:52,593][00191] Num frames 2000... [2024-09-29 16:06:52,714][00191] Num frames 2100... [2024-09-29 16:06:52,839][00191] Num frames 2200... [2024-09-29 16:06:52,963][00191] Num frames 2300... [2024-09-29 16:06:53,083][00191] Num frames 2400... [2024-09-29 16:06:53,264][00191] Avg episode rewards: #0: 17.970, true rewards: #0: 8.303 [2024-09-29 16:06:53,265][00191] Avg episode reward: 17.970, avg true_objective: 8.303 [2024-09-29 16:06:53,279][00191] Num frames 2500... [2024-09-29 16:06:53,397][00191] Num frames 2600... [2024-09-29 16:06:53,517][00191] Num frames 2700... [2024-09-29 16:06:53,653][00191] Num frames 2800... [2024-09-29 16:06:53,774][00191] Num frames 2900... [2024-09-29 16:06:53,894][00191] Num frames 3000... [2024-09-29 16:06:54,013][00191] Num frames 3100... [2024-09-29 16:06:54,135][00191] Num frames 3200... [2024-09-29 16:06:54,254][00191] Num frames 3300... [2024-09-29 16:06:54,381][00191] Num frames 3400... [2024-09-29 16:06:54,505][00191] Num frames 3500... [2024-09-29 16:06:54,671][00191] Avg episode rewards: #0: 19.448, true rewards: #0: 8.947 [2024-09-29 16:06:54,673][00191] Avg episode reward: 19.448, avg true_objective: 8.947 [2024-09-29 16:06:54,703][00191] Num frames 3600... [2024-09-29 16:06:54,821][00191] Num frames 3700... [2024-09-29 16:06:54,939][00191] Num frames 3800... [2024-09-29 16:06:55,056][00191] Num frames 3900... [2024-09-29 16:06:55,178][00191] Num frames 4000... [2024-09-29 16:06:55,307][00191] Avg episode rewards: #0: 16.918, true rewards: #0: 8.118 [2024-09-29 16:06:55,308][00191] Avg episode reward: 16.918, avg true_objective: 8.118 [2024-09-29 16:06:55,359][00191] Num frames 4100... [2024-09-29 16:06:55,480][00191] Num frames 4200... [2024-09-29 16:06:55,613][00191] Num frames 4300... [2024-09-29 16:06:55,734][00191] Num frames 4400... [2024-09-29 16:06:55,878][00191] Avg episode rewards: #0: 15.125, true rewards: #0: 7.458 [2024-09-29 16:06:55,880][00191] Avg episode reward: 15.125, avg true_objective: 7.458 [2024-09-29 16:06:55,912][00191] Num frames 4500... [2024-09-29 16:06:56,030][00191] Num frames 4600... [2024-09-29 16:06:56,151][00191] Num frames 4700... [2024-09-29 16:06:56,269][00191] Num frames 4800... [2024-09-29 16:06:56,391][00191] Num frames 4900... [2024-09-29 16:06:56,512][00191] Num frames 5000... [2024-09-29 16:06:56,591][00191] Avg episode rewards: #0: 14.170, true rewards: #0: 7.170 [2024-09-29 16:06:56,593][00191] Avg episode reward: 14.170, avg true_objective: 7.170 [2024-09-29 16:06:56,699][00191] Num frames 5100... [2024-09-29 16:06:56,816][00191] Num frames 5200... [2024-09-29 16:06:56,938][00191] Num frames 5300... [2024-09-29 16:06:57,056][00191] Num frames 5400... [2024-09-29 16:06:57,175][00191] Num frames 5500... [2024-09-29 16:06:57,332][00191] Num frames 5600... [2024-09-29 16:06:57,502][00191] Num frames 5700... [2024-09-29 16:06:57,689][00191] Num frames 5800... [2024-09-29 16:06:57,837][00191] Avg episode rewards: #0: 14.189, true rewards: #0: 7.314 [2024-09-29 16:06:57,841][00191] Avg episode reward: 14.189, avg true_objective: 7.314 [2024-09-29 16:06:57,925][00191] Num frames 5900... [2024-09-29 16:06:58,095][00191] Num frames 6000... [2024-09-29 16:06:58,265][00191] Num frames 6100... [2024-09-29 16:06:58,433][00191] Num frames 6200... [2024-09-29 16:06:58,616][00191] Num frames 6300... [2024-09-29 16:06:58,795][00191] Num frames 6400... [2024-09-29 16:06:58,983][00191] Num frames 6500... [2024-09-29 16:06:59,169][00191] Num frames 6600... [2024-09-29 16:06:59,341][00191] Num frames 6700... [2024-09-29 16:06:59,512][00191] Num frames 6800... [2024-09-29 16:06:59,741][00191] Avg episode rewards: #0: 15.107, true rewards: #0: 7.662 [2024-09-29 16:06:59,745][00191] Avg episode reward: 15.107, avg true_objective: 7.662 [2024-09-29 16:06:59,752][00191] Num frames 6900... [2024-09-29 16:06:59,885][00191] Num frames 7000... [2024-09-29 16:07:00,006][00191] Num frames 7100... [2024-09-29 16:07:00,129][00191] Num frames 7200... [2024-09-29 16:07:00,255][00191] Num frames 7300... [2024-09-29 16:07:00,383][00191] Num frames 7400... [2024-09-29 16:07:00,509][00191] Num frames 7500... [2024-09-29 16:07:00,655][00191] Num frames 7600... [2024-09-29 16:07:00,784][00191] Num frames 7700... [2024-09-29 16:07:00,914][00191] Num frames 7800... [2024-09-29 16:07:01,041][00191] Num frames 7900... [2024-09-29 16:07:01,161][00191] Num frames 8000... [2024-09-29 16:07:01,286][00191] Num frames 8100... [2024-09-29 16:07:01,407][00191] Num frames 8200... [2024-09-29 16:07:01,530][00191] Num frames 8300... [2024-09-29 16:07:01,661][00191] Num frames 8400... [2024-09-29 16:07:01,790][00191] Num frames 8500... [2024-09-29 16:07:01,926][00191] Num frames 8600... [2024-09-29 16:07:02,049][00191] Num frames 8700... [2024-09-29 16:07:02,176][00191] Num frames 8800... [2024-09-29 16:07:02,300][00191] Num frames 8900... [2024-09-29 16:07:02,475][00191] Avg episode rewards: #0: 19.596, true rewards: #0: 8.996 [2024-09-29 16:07:02,478][00191] Avg episode reward: 19.596, avg true_objective: 8.996 [2024-09-29 16:07:54,084][00191] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-29 16:10:31,478][00191] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-29 16:10:31,483][00191] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-29 16:10:31,485][00191] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-29 16:10:31,487][00191] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-29 16:10:31,491][00191] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-29 16:10:31,492][00191] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-29 16:10:31,495][00191] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-29 16:10:31,496][00191] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-29 16:10:31,498][00191] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-29 16:10:31,500][00191] Adding new argument 'hf_repository'='esperesa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-29 16:10:31,501][00191] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-29 16:10:31,507][00191] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-29 16:10:31,510][00191] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-29 16:10:31,512][00191] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-29 16:10:31,514][00191] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-29 16:10:31,552][00191] RunningMeanStd input shape: (3, 72, 128) [2024-09-29 16:10:31,555][00191] RunningMeanStd input shape: (1,) [2024-09-29 16:10:31,579][00191] ConvEncoder: input_channels=3 [2024-09-29 16:10:31,638][00191] Conv encoder output size: 512 [2024-09-29 16:10:31,639][00191] Policy head output size: 512 [2024-09-29 16:10:31,663][00191] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-29 16:10:32,067][00191] Num frames 100... [2024-09-29 16:10:32,192][00191] Num frames 200... [2024-09-29 16:10:32,320][00191] Num frames 300... [2024-09-29 16:10:32,444][00191] Num frames 400... [2024-09-29 16:10:32,575][00191] Num frames 500... [2024-09-29 16:10:32,693][00191] Num frames 600... [2024-09-29 16:10:32,815][00191] Num frames 700... [2024-09-29 16:10:32,921][00191] Avg episode rewards: #0: 14.410, true rewards: #0: 7.410 [2024-09-29 16:10:32,923][00191] Avg episode reward: 14.410, avg true_objective: 7.410 [2024-09-29 16:10:32,993][00191] Num frames 800... [2024-09-29 16:10:33,121][00191] Num frames 900... [2024-09-29 16:10:33,245][00191] Num frames 1000... [2024-09-29 16:10:33,372][00191] Num frames 1100... [2024-09-29 16:10:33,441][00191] Avg episode rewards: #0: 11.555, true rewards: #0: 5.555 [2024-09-29 16:10:33,442][00191] Avg episode reward: 11.555, avg true_objective: 5.555 [2024-09-29 16:10:33,553][00191] Num frames 1200... [2024-09-29 16:10:33,677][00191] Num frames 1300... [2024-09-29 16:10:33,800][00191] Num frames 1400... [2024-09-29 16:10:33,924][00191] Num frames 1500... [2024-09-29 16:10:34,051][00191] Avg episode rewards: #0: 9.530, true rewards: #0: 5.197 [2024-09-29 16:10:34,052][00191] Avg episode reward: 9.530, avg true_objective: 5.197 [2024-09-29 16:10:34,103][00191] Num frames 1600... [2024-09-29 16:10:34,226][00191] Num frames 1700... [2024-09-29 16:10:34,349][00191] Num frames 1800... [2024-09-29 16:10:34,479][00191] Num frames 1900... [2024-09-29 16:10:34,612][00191] Num frames 2000... [2024-09-29 16:10:34,732][00191] Num frames 2100... [2024-09-29 16:10:34,854][00191] Num frames 2200... [2024-09-29 16:10:34,979][00191] Num frames 2300... [2024-09-29 16:10:35,100][00191] Num frames 2400... [2024-09-29 16:10:35,258][00191] Avg episode rewards: #0: 11.718, true rewards: #0: 6.217 [2024-09-29 16:10:35,259][00191] Avg episode reward: 11.718, avg true_objective: 6.217 [2024-09-29 16:10:35,278][00191] Num frames 2500... [2024-09-29 16:10:35,412][00191] Num frames 2600... [2024-09-29 16:10:35,534][00191] Num frames 2700... [2024-09-29 16:10:35,665][00191] Num frames 2800... [2024-09-29 16:10:35,784][00191] Num frames 2900... [2024-09-29 16:10:35,911][00191] Num frames 3000... [2024-09-29 16:10:36,080][00191] Avg episode rewards: #0: 11.190, true rewards: #0: 6.190 [2024-09-29 16:10:36,082][00191] Avg episode reward: 11.190, avg true_objective: 6.190 [2024-09-29 16:10:36,090][00191] Num frames 3100... [2024-09-29 16:10:36,208][00191] Num frames 3200... [2024-09-29 16:10:36,331][00191] Num frames 3300... [2024-09-29 16:10:36,471][00191] Num frames 3400... [2024-09-29 16:10:36,606][00191] Num frames 3500... [2024-09-29 16:10:36,731][00191] Num frames 3600... [2024-09-29 16:10:36,856][00191] Num frames 3700... [2024-09-29 16:10:36,983][00191] Num frames 3800... [2024-09-29 16:10:37,108][00191] Num frames 3900... [2024-09-29 16:10:37,233][00191] Num frames 4000... [2024-09-29 16:10:37,355][00191] Num frames 4100... [2024-09-29 16:10:37,485][00191] Num frames 4200... [2024-09-29 16:10:37,616][00191] Num frames 4300... [2024-09-29 16:10:37,735][00191] Num frames 4400... [2024-09-29 16:10:37,853][00191] Num frames 4500... [2024-09-29 16:10:37,973][00191] Num frames 4600... [2024-09-29 16:10:38,091][00191] Num frames 4700... [2024-09-29 16:10:38,215][00191] Num frames 4800... [2024-09-29 16:10:38,335][00191] Num frames 4900... [2024-09-29 16:10:38,467][00191] Num frames 5000... [2024-09-29 16:10:38,609][00191] Num frames 5100... [2024-09-29 16:10:38,782][00191] Avg episode rewards: #0: 18.991, true rewards: #0: 8.658 [2024-09-29 16:10:38,784][00191] Avg episode reward: 18.991, avg true_objective: 8.658 [2024-09-29 16:10:38,793][00191] Num frames 5200... [2024-09-29 16:10:38,915][00191] Num frames 5300... [2024-09-29 16:10:39,036][00191] Num frames 5400... [2024-09-29 16:10:39,159][00191] Num frames 5500... [2024-09-29 16:10:39,295][00191] Num frames 5600... [2024-09-29 16:10:39,418][00191] Num frames 5700... [2024-09-29 16:10:39,554][00191] Num frames 5800... [2024-09-29 16:10:39,696][00191] Num frames 5900... [2024-09-29 16:10:39,820][00191] Num frames 6000... [2024-09-29 16:10:39,946][00191] Num frames 6100... [2024-09-29 16:10:40,067][00191] Num frames 6200... [2024-09-29 16:10:40,186][00191] Avg episode rewards: #0: 19.216, true rewards: #0: 8.930 [2024-09-29 16:10:40,188][00191] Avg episode reward: 19.216, avg true_objective: 8.930 [2024-09-29 16:10:40,250][00191] Num frames 6300... [2024-09-29 16:10:40,371][00191] Num frames 6400... [2024-09-29 16:10:40,496][00191] Num frames 6500... [2024-09-29 16:10:40,637][00191] Num frames 6600... [2024-09-29 16:10:40,761][00191] Num frames 6700... [2024-09-29 16:10:40,886][00191] Num frames 6800... [2024-09-29 16:10:41,008][00191] Num frames 6900... [2024-09-29 16:10:41,132][00191] Num frames 7000... [2024-09-29 16:10:41,252][00191] Num frames 7100... [2024-09-29 16:10:41,379][00191] Num frames 7200... [2024-09-29 16:10:41,506][00191] Num frames 7300... [2024-09-29 16:10:41,668][00191] Num frames 7400... [2024-09-29 16:10:41,854][00191] Num frames 7500... [2024-09-29 16:10:41,917][00191] Avg episode rewards: #0: 20.502, true rewards: #0: 9.377 [2024-09-29 16:10:41,920][00191] Avg episode reward: 20.502, avg true_objective: 9.377 [2024-09-29 16:10:42,082][00191] Num frames 7600... [2024-09-29 16:10:42,249][00191] Num frames 7700... [2024-09-29 16:10:42,421][00191] Num frames 7800... [2024-09-29 16:10:42,593][00191] Num frames 7900... [2024-09-29 16:10:42,762][00191] Num frames 8000... [2024-09-29 16:10:42,933][00191] Num frames 8100... [2024-09-29 16:10:43,108][00191] Num frames 8200... [2024-09-29 16:10:43,231][00191] Avg episode rewards: #0: 19.709, true rewards: #0: 9.153 [2024-09-29 16:10:43,233][00191] Avg episode reward: 19.709, avg true_objective: 9.153 [2024-09-29 16:10:43,344][00191] Num frames 8300... [2024-09-29 16:10:43,532][00191] Num frames 8400... [2024-09-29 16:10:43,719][00191] Num frames 8500... [2024-09-29 16:10:43,899][00191] Num frames 8600... [2024-09-29 16:10:44,063][00191] Num frames 8700... [2024-09-29 16:10:44,190][00191] Num frames 8800... [2024-09-29 16:10:44,313][00191] Num frames 8900... [2024-09-29 16:10:44,437][00191] Num frames 9000... [2024-09-29 16:10:44,573][00191] Num frames 9100... [2024-09-29 16:10:44,700][00191] Num frames 9200... [2024-09-29 16:10:44,824][00191] Num frames 9300... [2024-09-29 16:10:44,948][00191] Num frames 9400... [2024-09-29 16:10:45,071][00191] Num frames 9500... [2024-09-29 16:10:45,197][00191] Num frames 9600... [2024-09-29 16:10:45,277][00191] Avg episode rewards: #0: 21.019, true rewards: #0: 9.619 [2024-09-29 16:10:45,280][00191] Avg episode reward: 21.019, avg true_objective: 9.619 [2024-09-29 16:11:39,958][00191] Replay video saved to /content/train_dir/default_experiment/replay.mp4!