[2024-12-23 14:33:40,778][00245] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-23 14:33:40,786][00245] Rollout worker 0 uses device cpu [2024-12-23 14:33:40,787][00245] Rollout worker 1 uses device cpu [2024-12-23 14:33:40,790][00245] Rollout worker 2 uses device cpu [2024-12-23 14:33:40,792][00245] Rollout worker 3 uses device cpu [2024-12-23 14:33:40,794][00245] Rollout worker 4 uses device cpu [2024-12-23 14:33:40,795][00245] Rollout worker 5 uses device cpu [2024-12-23 14:33:40,797][00245] Rollout worker 6 uses device cpu [2024-12-23 14:33:40,799][00245] Rollout worker 7 uses device cpu [2024-12-23 14:33:41,075][00245] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 14:33:41,081][00245] InferenceWorker_p0-w0: min num requests: 2 [2024-12-23 14:33:41,194][00245] Starting all processes... [2024-12-23 14:33:41,206][00245] Starting process learner_proc0 [2024-12-23 14:33:41,384][00245] Starting all processes... [2024-12-23 14:33:41,426][00245] Starting process inference_proc0-0 [2024-12-23 14:33:41,426][00245] Starting process rollout_proc0 [2024-12-23 14:33:41,428][00245] Starting process rollout_proc1 [2024-12-23 14:33:41,432][00245] Starting process rollout_proc2 [2024-12-23 14:33:41,432][00245] Starting process rollout_proc3 [2024-12-23 14:33:41,432][00245] Starting process rollout_proc4 [2024-12-23 14:33:41,432][00245] Starting process rollout_proc5 [2024-12-23 14:33:41,432][00245] Starting process rollout_proc6 [2024-12-23 14:33:41,432][00245] Starting process rollout_proc7 [2024-12-23 14:34:01,041][02388] Worker 5 uses CPU cores [1] [2024-12-23 14:34:01,241][02391] Worker 7 uses CPU cores [1] [2024-12-23 14:34:01,304][02370] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 14:34:01,309][02370] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-23 14:34:01,347][02370] Num visible devices: 1 [2024-12-23 14:34:01,359][00245] Heartbeat connected on RolloutWorker_w5 [2024-12-23 14:34:01,387][00245] Heartbeat connected on RolloutWorker_w7 [2024-12-23 14:34:01,391][02370] Starting seed is not provided [2024-12-23 14:34:01,392][02370] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 14:34:01,392][02370] Initializing actor-critic model on device cuda:0 [2024-12-23 14:34:01,393][02370] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 14:34:01,394][00245] Heartbeat connected on Batcher_0 [2024-12-23 14:34:01,398][02370] RunningMeanStd input shape: (1,) [2024-12-23 14:34:01,450][02385] Worker 1 uses CPU cores [1] [2024-12-23 14:34:01,480][02370] ConvEncoder: input_channels=3 [2024-12-23 14:34:01,497][02383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 14:34:01,498][02383] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-23 14:34:01,511][02386] Worker 2 uses CPU cores [0] [2024-12-23 14:34:01,580][00245] Heartbeat connected on RolloutWorker_w1 [2024-12-23 14:34:01,634][02389] Worker 4 uses CPU cores [0] [2024-12-23 14:34:01,641][02383] Num visible devices: 1 [2024-12-23 14:34:01,655][02384] Worker 0 uses CPU cores [0] [2024-12-23 14:34:01,665][00245] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-23 14:34:01,681][00245] Heartbeat connected on RolloutWorker_w2 [2024-12-23 14:34:01,703][02390] Worker 6 uses CPU cores [0] [2024-12-23 14:34:01,738][02387] Worker 3 uses CPU cores [1] [2024-12-23 14:34:01,745][00245] Heartbeat connected on RolloutWorker_w4 [2024-12-23 14:34:01,755][00245] Heartbeat connected on RolloutWorker_w0 [2024-12-23 14:34:01,780][00245] Heartbeat connected on RolloutWorker_w6 [2024-12-23 14:34:01,800][00245] Heartbeat connected on RolloutWorker_w3 [2024-12-23 14:34:02,024][02370] Conv encoder output size: 512 [2024-12-23 14:34:02,025][02370] Policy head output size: 512 [2024-12-23 14:34:02,111][02370] Created Actor Critic model with architecture: [2024-12-23 14:34:02,112][02370] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-23 14:34:02,430][02370] Using optimizer [2024-12-23 14:34:05,761][02370] No checkpoints found [2024-12-23 14:34:05,762][02370] Did not load from checkpoint, starting from scratch! [2024-12-23 14:34:05,762][02370] Initialized policy 0 weights for model version 0 [2024-12-23 14:34:05,766][02370] LearnerWorker_p0 finished initialization! [2024-12-23 14:34:05,767][00245] Heartbeat connected on LearnerWorker_p0 [2024-12-23 14:34:05,766][02370] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 14:34:05,864][02383] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 14:34:05,865][02383] RunningMeanStd input shape: (1,) [2024-12-23 14:34:05,877][02383] ConvEncoder: input_channels=3 [2024-12-23 14:34:05,982][02383] Conv encoder output size: 512 [2024-12-23 14:34:05,983][02383] Policy head output size: 512 [2024-12-23 14:34:06,037][00245] Inference worker 0-0 is ready! [2024-12-23 14:34:06,038][00245] All inference workers are ready! Signal rollout workers to start! [2024-12-23 14:34:06,246][02387] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,245][02391] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,247][02385] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,252][02388] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,270][02389] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,271][02386] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,266][02384] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:06,274][02390] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:34:07,126][00245] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 14:34:07,668][02386] Decorrelating experience for 0 frames... [2024-12-23 14:34:07,666][02384] Decorrelating experience for 0 frames... [2024-12-23 14:34:07,669][02389] Decorrelating experience for 0 frames... [2024-12-23 14:34:07,938][02387] Decorrelating experience for 0 frames... [2024-12-23 14:34:07,945][02391] Decorrelating experience for 0 frames... [2024-12-23 14:34:07,949][02388] Decorrelating experience for 0 frames... [2024-12-23 14:34:07,961][02385] Decorrelating experience for 0 frames... [2024-12-23 14:34:09,680][02388] Decorrelating experience for 32 frames... [2024-12-23 14:34:09,699][02386] Decorrelating experience for 32 frames... [2024-12-23 14:34:09,707][02385] Decorrelating experience for 32 frames... [2024-12-23 14:34:09,708][02389] Decorrelating experience for 32 frames... [2024-12-23 14:34:10,195][02384] Decorrelating experience for 32 frames... [2024-12-23 14:34:10,228][02390] Decorrelating experience for 0 frames... [2024-12-23 14:34:11,626][02390] Decorrelating experience for 32 frames... [2024-12-23 14:34:11,695][02389] Decorrelating experience for 64 frames... [2024-12-23 14:34:11,897][02391] Decorrelating experience for 32 frames... [2024-12-23 14:34:12,131][00245] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 14:34:12,181][02387] Decorrelating experience for 32 frames... [2024-12-23 14:34:12,197][02384] Decorrelating experience for 64 frames... [2024-12-23 14:34:12,358][02388] Decorrelating experience for 64 frames... [2024-12-23 14:34:13,416][02386] Decorrelating experience for 64 frames... [2024-12-23 14:34:13,660][02385] Decorrelating experience for 64 frames... [2024-12-23 14:34:13,997][02390] Decorrelating experience for 64 frames... [2024-12-23 14:34:14,123][02387] Decorrelating experience for 64 frames... [2024-12-23 14:34:14,164][02384] Decorrelating experience for 96 frames... [2024-12-23 14:34:15,288][02389] Decorrelating experience for 96 frames... [2024-12-23 14:34:15,326][02391] Decorrelating experience for 64 frames... [2024-12-23 14:34:15,440][02386] Decorrelating experience for 96 frames... [2024-12-23 14:34:15,457][02385] Decorrelating experience for 96 frames... [2024-12-23 14:34:15,859][02387] Decorrelating experience for 96 frames... [2024-12-23 14:34:16,298][02390] Decorrelating experience for 96 frames... [2024-12-23 14:34:16,536][02388] Decorrelating experience for 96 frames... [2024-12-23 14:34:16,627][02391] Decorrelating experience for 96 frames... [2024-12-23 14:34:17,126][00245] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.8. Samples: 28. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 14:34:17,128][00245] Avg episode reward: [(0, '0.372')] [2024-12-23 14:34:19,086][02370] Signal inference workers to stop experience collection... [2024-12-23 14:34:19,100][02383] InferenceWorker_p0-w0: stopping experience collection [2024-12-23 14:34:22,126][00245] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 146.9. Samples: 2204. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 14:34:22,127][00245] Avg episode reward: [(0, '1.594')] [2024-12-23 14:34:22,206][02370] Signal inference workers to resume experience collection... [2024-12-23 14:34:22,207][02383] InferenceWorker_p0-w0: resuming experience collection [2024-12-23 14:34:27,129][00245] Fps is (10 sec: 2047.4, 60 sec: 1023.8, 300 sec: 1023.8). Total num frames: 20480. Throughput: 0: 319.6. Samples: 6392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-12-23 14:34:27,131][00245] Avg episode reward: [(0, '3.366')] [2024-12-23 14:34:32,126][00245] Fps is (10 sec: 3686.4, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 338.3. Samples: 8458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:34:32,131][00245] Avg episode reward: [(0, '3.649')] [2024-12-23 14:34:32,897][02383] Updated weights for policy 0, policy_version 10 (0.0157) [2024-12-23 14:34:37,126][00245] Fps is (10 sec: 3687.5, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 484.3. Samples: 14530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:34:37,133][00245] Avg episode reward: [(0, '4.491')] [2024-12-23 14:34:41,916][02383] Updated weights for policy 0, policy_version 20 (0.0021) [2024-12-23 14:34:42,126][00245] Fps is (10 sec: 4505.6, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 595.3. Samples: 20836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:34:42,129][00245] Avg episode reward: [(0, '4.422')] [2024-12-23 14:34:47,126][00245] Fps is (10 sec: 3686.4, 60 sec: 2355.2, 300 sec: 2355.2). Total num frames: 94208. Throughput: 0: 571.8. Samples: 22872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 14:34:47,131][00245] Avg episode reward: [(0, '4.234')] [2024-12-23 14:34:52,126][00245] Fps is (10 sec: 3276.8, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 114688. Throughput: 0: 625.4. Samples: 28144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:34:52,128][00245] Avg episode reward: [(0, '4.302')] [2024-12-23 14:34:52,136][02370] Saving new best policy, reward=4.302! [2024-12-23 14:34:53,824][02383] Updated weights for policy 0, policy_version 30 (0.0023) [2024-12-23 14:34:57,126][00245] Fps is (10 sec: 4096.0, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 779.3. Samples: 35066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:34:57,130][00245] Avg episode reward: [(0, '4.501')] [2024-12-23 14:34:57,205][02370] Saving new best policy, reward=4.501! [2024-12-23 14:35:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 843.6. Samples: 37990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:35:02,132][00245] Avg episode reward: [(0, '4.490')] [2024-12-23 14:35:05,966][02383] Updated weights for policy 0, policy_version 40 (0.0032) [2024-12-23 14:35:07,126][00245] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 874.3. Samples: 41546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:35:07,134][00245] Avg episode reward: [(0, '4.509')] [2024-12-23 14:35:07,136][02370] Saving new best policy, reward=4.509! [2024-12-23 14:35:12,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3140.5, 300 sec: 2898.7). Total num frames: 188416. Throughput: 0: 928.9. Samples: 48188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:35:12,128][00245] Avg episode reward: [(0, '4.501')] [2024-12-23 14:35:15,198][02383] Updated weights for policy 0, policy_version 50 (0.0019) [2024-12-23 14:35:17,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 958.8. Samples: 51602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:35:17,131][00245] Avg episode reward: [(0, '4.464')] [2024-12-23 14:35:22,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 922.8. Samples: 56058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:35:22,128][00245] Avg episode reward: [(0, '4.412')] [2024-12-23 14:35:27,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 912.7. Samples: 61908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:35:27,129][00245] Avg episode reward: [(0, '4.363')] [2024-12-23 14:35:27,228][02383] Updated weights for policy 0, policy_version 60 (0.0020) [2024-12-23 14:35:32,126][00245] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 945.9. Samples: 65436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:35:32,128][00245] Avg episode reward: [(0, '4.465')] [2024-12-23 14:35:32,134][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth... [2024-12-23 14:35:37,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 947.6. Samples: 70786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:35:37,133][00245] Avg episode reward: [(0, '4.321')] [2024-12-23 14:35:37,983][02383] Updated weights for policy 0, policy_version 70 (0.0025) [2024-12-23 14:35:42,126][00245] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 910.7. Samples: 76046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:35:42,132][00245] Avg episode reward: [(0, '4.282')] [2024-12-23 14:35:47,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 920.1. Samples: 79396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:35:47,135][00245] Avg episode reward: [(0, '4.327')] [2024-12-23 14:35:47,850][02383] Updated weights for policy 0, policy_version 80 (0.0013) [2024-12-23 14:35:52,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 344064. Throughput: 0: 981.3. Samples: 85706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:35:52,130][00245] Avg episode reward: [(0, '4.351')] [2024-12-23 14:35:57,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3239.6). Total num frames: 356352. Throughput: 0: 931.6. Samples: 90108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:35:57,131][00245] Avg episode reward: [(0, '4.361')] [2024-12-23 14:35:59,175][02383] Updated weights for policy 0, policy_version 90 (0.0028) [2024-12-23 14:36:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 932.8. Samples: 93578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:36:02,128][00245] Avg episode reward: [(0, '4.413')] [2024-12-23 14:36:07,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 988.4. Samples: 100534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:36:07,127][00245] Avg episode reward: [(0, '4.475')] [2024-12-23 14:36:09,255][02383] Updated weights for policy 0, policy_version 100 (0.0024) [2024-12-23 14:36:12,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 955.3. Samples: 104898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:36:12,132][00245] Avg episode reward: [(0, '4.520')] [2024-12-23 14:36:12,146][02370] Saving new best policy, reward=4.520! [2024-12-23 14:36:17,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3339.8). Total num frames: 434176. Throughput: 0: 935.7. Samples: 107542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:36:17,133][00245] Avg episode reward: [(0, '4.326')] [2024-12-23 14:36:20,101][02383] Updated weights for policy 0, policy_version 110 (0.0016) [2024-12-23 14:36:22,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3398.2). Total num frames: 458752. Throughput: 0: 968.4. Samples: 114362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:36:22,136][00245] Avg episode reward: [(0, '4.242')] [2024-12-23 14:36:27,128][00245] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 972.2. Samples: 119796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:36:27,130][00245] Avg episode reward: [(0, '4.247')] [2024-12-23 14:36:31,643][02383] Updated weights for policy 0, policy_version 120 (0.0029) [2024-12-23 14:36:32,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 944.8. Samples: 121914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:36:32,128][00245] Avg episode reward: [(0, '4.369')] [2024-12-23 14:36:37,126][00245] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 516096. Throughput: 0: 948.6. Samples: 128392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:36:37,132][00245] Avg episode reward: [(0, '4.596')] [2024-12-23 14:36:37,137][02370] Saving new best policy, reward=4.596! [2024-12-23 14:36:40,551][02383] Updated weights for policy 0, policy_version 130 (0.0025) [2024-12-23 14:36:42,128][00245] Fps is (10 sec: 4504.6, 60 sec: 3959.3, 300 sec: 3461.7). Total num frames: 536576. Throughput: 0: 989.5. Samples: 134636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:36:42,130][00245] Avg episode reward: [(0, '4.690')] [2024-12-23 14:36:42,139][02370] Saving new best policy, reward=4.690! [2024-12-23 14:36:47,126][00245] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 956.4. Samples: 136618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:36:47,133][00245] Avg episode reward: [(0, '4.652')] [2024-12-23 14:36:52,126][00245] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 925.3. Samples: 142172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:36:52,129][00245] Avg episode reward: [(0, '4.513')] [2024-12-23 14:36:52,640][02383] Updated weights for policy 0, policy_version 140 (0.0025) [2024-12-23 14:36:57,126][00245] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3493.6). Total num frames: 593920. Throughput: 0: 982.4. Samples: 149106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:36:57,130][00245] Avg episode reward: [(0, '4.270')] [2024-12-23 14:37:02,127][00245] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3464.0). Total num frames: 606208. Throughput: 0: 980.2. Samples: 151654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:37:02,132][00245] Avg episode reward: [(0, '4.458')] [2024-12-23 14:37:03,827][02383] Updated weights for policy 0, policy_version 150 (0.0025) [2024-12-23 14:37:07,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 626688. Throughput: 0: 929.5. Samples: 156190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:37:07,128][00245] Avg episode reward: [(0, '4.575')] [2024-12-23 14:37:12,126][00245] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3498.2). Total num frames: 647168. Throughput: 0: 964.2. Samples: 163182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:37:12,128][00245] Avg episode reward: [(0, '4.597')] [2024-12-23 14:37:13,260][02383] Updated weights for policy 0, policy_version 160 (0.0020) [2024-12-23 14:37:17,126][00245] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3513.9). Total num frames: 667648. Throughput: 0: 990.2. Samples: 166474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:37:17,128][00245] Avg episode reward: [(0, '4.763')] [2024-12-23 14:37:17,138][02370] Saving new best policy, reward=4.763! [2024-12-23 14:37:22,127][00245] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3486.8). Total num frames: 679936. Throughput: 0: 936.3. Samples: 170526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 14:37:22,130][00245] Avg episode reward: [(0, '4.659')] [2024-12-23 14:37:25,171][02383] Updated weights for policy 0, policy_version 170 (0.0022) [2024-12-23 14:37:27,128][00245] Fps is (10 sec: 3685.6, 60 sec: 3822.9, 300 sec: 3522.5). Total num frames: 704512. Throughput: 0: 939.3. Samples: 176904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:37:27,131][00245] Avg episode reward: [(0, '4.535')] [2024-12-23 14:37:32,126][00245] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3536.5). Total num frames: 724992. Throughput: 0: 969.9. Samples: 180264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:37:32,128][00245] Avg episode reward: [(0, '4.660')] [2024-12-23 14:37:32,137][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth... [2024-12-23 14:37:35,262][02383] Updated weights for policy 0, policy_version 180 (0.0023) [2024-12-23 14:37:37,126][00245] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 960.6. Samples: 185400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:37:37,131][00245] Avg episode reward: [(0, '4.851')] [2024-12-23 14:37:37,135][02370] Saving new best policy, reward=4.851! [2024-12-23 14:37:42,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3524.5). Total num frames: 757760. Throughput: 0: 928.9. Samples: 190908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:37:42,131][00245] Avg episode reward: [(0, '4.843')] [2024-12-23 14:37:45,852][02383] Updated weights for policy 0, policy_version 190 (0.0027) [2024-12-23 14:37:47,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3556.1). Total num frames: 782336. Throughput: 0: 945.7. Samples: 194208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:37:47,132][00245] Avg episode reward: [(0, '4.704')] [2024-12-23 14:37:52,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 798720. Throughput: 0: 981.2. Samples: 200344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:37:52,130][00245] Avg episode reward: [(0, '4.739')] [2024-12-23 14:37:57,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3543.9). Total num frames: 815104. Throughput: 0: 921.9. Samples: 204668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:37:57,132][00245] Avg episode reward: [(0, '4.664')] [2024-12-23 14:37:57,745][02383] Updated weights for policy 0, policy_version 200 (0.0019) [2024-12-23 14:38:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3555.7). Total num frames: 835584. Throughput: 0: 924.4. Samples: 208072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:02,132][00245] Avg episode reward: [(0, '4.657')] [2024-12-23 14:38:06,842][02383] Updated weights for policy 0, policy_version 210 (0.0024) [2024-12-23 14:38:07,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 984.9. Samples: 214844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:07,134][00245] Avg episode reward: [(0, '4.686')] [2024-12-23 14:38:12,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 934.1. Samples: 218938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:12,128][00245] Avg episode reward: [(0, '4.617')] [2024-12-23 14:38:17,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 919.2. Samples: 221626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:38:17,130][00245] Avg episode reward: [(0, '4.851')] [2024-12-23 14:38:18,818][02383] Updated weights for policy 0, policy_version 220 (0.0016) [2024-12-23 14:38:22,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3582.0). Total num frames: 913408. Throughput: 0: 954.0. Samples: 228332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:38:22,130][00245] Avg episode reward: [(0, '4.690')] [2024-12-23 14:38:27,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3576.1). Total num frames: 929792. Throughput: 0: 944.0. Samples: 233388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:38:27,131][00245] Avg episode reward: [(0, '4.649')] [2024-12-23 14:38:30,703][02383] Updated weights for policy 0, policy_version 230 (0.0025) [2024-12-23 14:38:32,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3570.5). Total num frames: 946176. Throughput: 0: 914.2. Samples: 235346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:38:32,132][00245] Avg episode reward: [(0, '4.815')] [2024-12-23 14:38:37,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 966656. Throughput: 0: 924.0. Samples: 241926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:37,131][00245] Avg episode reward: [(0, '4.782')] [2024-12-23 14:38:40,016][02383] Updated weights for policy 0, policy_version 240 (0.0024) [2024-12-23 14:38:42,126][00245] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3589.6). Total num frames: 987136. Throughput: 0: 963.4. Samples: 248022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:42,129][00245] Avg episode reward: [(0, '4.658')] [2024-12-23 14:38:47,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3569.4). Total num frames: 999424. Throughput: 0: 929.6. Samples: 249906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:47,134][00245] Avg episode reward: [(0, '4.703')] [2024-12-23 14:38:52,038][02383] Updated weights for policy 0, policy_version 250 (0.0028) [2024-12-23 14:38:52,126][00245] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3593.0). Total num frames: 1024000. Throughput: 0: 905.6. Samples: 255596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:38:52,133][00245] Avg episode reward: [(0, '4.924')] [2024-12-23 14:38:52,142][02370] Saving new best policy, reward=4.924! [2024-12-23 14:38:57,126][00245] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3601.7). Total num frames: 1044480. Throughput: 0: 962.6. Samples: 262256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:38:57,132][00245] Avg episode reward: [(0, '4.941')] [2024-12-23 14:38:57,137][02370] Saving new best policy, reward=4.941! [2024-12-23 14:39:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 951.2. Samples: 264432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 14:39:02,128][00245] Avg episode reward: [(0, '4.853')] [2024-12-23 14:39:03,388][02383] Updated weights for policy 0, policy_version 260 (0.0033) [2024-12-23 14:39:07,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 903.7. Samples: 269000. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 14:39:07,133][00245] Avg episode reward: [(0, '4.871')] [2024-12-23 14:39:12,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 944.3. Samples: 275880. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-23 14:39:12,128][00245] Avg episode reward: [(0, '5.071')] [2024-12-23 14:39:12,137][02370] Saving new best policy, reward=5.071! [2024-12-23 14:39:13,018][02383] Updated weights for policy 0, policy_version 270 (0.0017) [2024-12-23 14:39:17,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 967.2. Samples: 278872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:39:17,131][00245] Avg episode reward: [(0, '5.220')] [2024-12-23 14:39:17,134][02370] Saving new best policy, reward=5.220! [2024-12-23 14:39:22,126][00245] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1130496. Throughput: 0: 910.5. Samples: 282900. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-23 14:39:22,132][00245] Avg episode reward: [(0, '5.638')] [2024-12-23 14:39:22,140][02370] Saving new best policy, reward=5.638! [2024-12-23 14:39:24,967][02383] Updated weights for policy 0, policy_version 280 (0.0017) [2024-12-23 14:39:27,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1155072. Throughput: 0: 923.2. Samples: 289568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:39:27,130][00245] Avg episode reward: [(0, '5.883')] [2024-12-23 14:39:27,134][02370] Saving new best policy, reward=5.883! [2024-12-23 14:39:32,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1175552. Throughput: 0: 958.9. Samples: 293058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:39:32,129][00245] Avg episode reward: [(0, '5.794')] [2024-12-23 14:39:32,146][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000287_1175552.pth... [2024-12-23 14:39:32,295][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth [2024-12-23 14:39:35,514][02383] Updated weights for policy 0, policy_version 290 (0.0022) [2024-12-23 14:39:37,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1191936. Throughput: 0: 942.0. Samples: 297988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:39:37,128][00245] Avg episode reward: [(0, '5.587')] [2024-12-23 14:39:42,126][00245] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1212416. Throughput: 0: 921.6. Samples: 303728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:39:42,130][00245] Avg episode reward: [(0, '5.549')] [2024-12-23 14:39:45,540][02383] Updated weights for policy 0, policy_version 300 (0.0020) [2024-12-23 14:39:47,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1232896. Throughput: 0: 949.3. Samples: 307152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:39:47,129][00245] Avg episode reward: [(0, '5.341')] [2024-12-23 14:39:52,126][00245] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1253376. Throughput: 0: 980.7. Samples: 313132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:39:52,130][00245] Avg episode reward: [(0, '5.701')] [2024-12-23 14:39:57,080][02383] Updated weights for policy 0, policy_version 310 (0.0028) [2024-12-23 14:39:57,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1269760. Throughput: 0: 935.6. Samples: 317984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:39:57,130][00245] Avg episode reward: [(0, '5.998')] [2024-12-23 14:39:57,136][02370] Saving new best policy, reward=5.998! [2024-12-23 14:40:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1290240. Throughput: 0: 944.8. Samples: 321388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:40:02,129][00245] Avg episode reward: [(0, '5.898')] [2024-12-23 14:40:06,194][02383] Updated weights for policy 0, policy_version 320 (0.0035) [2024-12-23 14:40:07,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1310720. Throughput: 0: 1007.9. Samples: 328254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:40:07,131][00245] Avg episode reward: [(0, '5.526')] [2024-12-23 14:40:12,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1327104. Throughput: 0: 952.6. Samples: 332436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:40:12,128][00245] Avg episode reward: [(0, '5.290')] [2024-12-23 14:40:17,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1347584. Throughput: 0: 940.4. Samples: 335378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:40:17,127][00245] Avg episode reward: [(0, '5.324')] [2024-12-23 14:40:17,793][02383] Updated weights for policy 0, policy_version 330 (0.0015) [2024-12-23 14:40:22,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1372160. Throughput: 0: 990.1. Samples: 342542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:40:22,132][00245] Avg episode reward: [(0, '5.318')] [2024-12-23 14:40:27,130][00245] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 1384448. Throughput: 0: 974.0. Samples: 347560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:40:27,134][00245] Avg episode reward: [(0, '5.036')] [2024-12-23 14:40:28,753][02383] Updated weights for policy 0, policy_version 340 (0.0030) [2024-12-23 14:40:32,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1404928. Throughput: 0: 949.9. Samples: 349896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:40:32,128][00245] Avg episode reward: [(0, '5.058')] [2024-12-23 14:40:37,126][00245] Fps is (10 sec: 4507.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1429504. Throughput: 0: 973.8. Samples: 356952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:40:37,128][00245] Avg episode reward: [(0, '5.238')] [2024-12-23 14:40:37,702][02383] Updated weights for policy 0, policy_version 350 (0.0015) [2024-12-23 14:40:42,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1445888. Throughput: 0: 996.6. Samples: 362832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:40:42,130][00245] Avg episode reward: [(0, '5.372')] [2024-12-23 14:40:47,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1462272. Throughput: 0: 966.4. Samples: 364874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:40:47,137][00245] Avg episode reward: [(0, '5.283')] [2024-12-23 14:40:49,493][02383] Updated weights for policy 0, policy_version 360 (0.0035) [2024-12-23 14:40:52,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1486848. Throughput: 0: 952.4. Samples: 371112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:40:52,128][00245] Avg episode reward: [(0, '4.912')] [2024-12-23 14:40:57,132][00245] Fps is (10 sec: 4502.8, 60 sec: 3959.1, 300 sec: 3818.2). Total num frames: 1507328. Throughput: 0: 1011.2. Samples: 377946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:40:57,134][00245] Avg episode reward: [(0, '5.444')] [2024-12-23 14:40:59,227][02383] Updated weights for policy 0, policy_version 370 (0.0031) [2024-12-23 14:41:02,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1519616. Throughput: 0: 992.5. Samples: 380042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:41:02,132][00245] Avg episode reward: [(0, '5.550')] [2024-12-23 14:41:07,126][00245] Fps is (10 sec: 3688.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1544192. Throughput: 0: 954.1. Samples: 385478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:41:07,128][00245] Avg episode reward: [(0, '6.159')] [2024-12-23 14:41:07,130][02370] Saving new best policy, reward=6.159! [2024-12-23 14:41:09,759][02383] Updated weights for policy 0, policy_version 380 (0.0024) [2024-12-23 14:41:12,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1564672. Throughput: 0: 999.1. Samples: 392516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:41:12,130][00245] Avg episode reward: [(0, '6.427')] [2024-12-23 14:41:12,138][02370] Saving new best policy, reward=6.427! [2024-12-23 14:41:17,128][00245] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 1581056. Throughput: 0: 1006.4. Samples: 395188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:41:17,130][00245] Avg episode reward: [(0, '6.198')] [2024-12-23 14:41:21,508][02383] Updated weights for policy 0, policy_version 390 (0.0033) [2024-12-23 14:41:22,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1597440. Throughput: 0: 946.3. Samples: 399534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:41:22,130][00245] Avg episode reward: [(0, '6.176')] [2024-12-23 14:41:27,127][00245] Fps is (10 sec: 4096.4, 60 sec: 3959.7, 300 sec: 3832.2). Total num frames: 1622016. Throughput: 0: 971.3. Samples: 406542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:41:27,130][00245] Avg episode reward: [(0, '5.919')] [2024-12-23 14:41:30,103][02383] Updated weights for policy 0, policy_version 400 (0.0021) [2024-12-23 14:41:32,126][00245] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1642496. Throughput: 0: 1005.0. Samples: 410098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:41:32,130][00245] Avg episode reward: [(0, '6.126')] [2024-12-23 14:41:32,243][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000402_1646592.pth... [2024-12-23 14:41:32,407][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth [2024-12-23 14:41:37,126][00245] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1658880. Throughput: 0: 966.3. Samples: 414594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:41:37,131][00245] Avg episode reward: [(0, '6.548')] [2024-12-23 14:41:37,139][02370] Saving new best policy, reward=6.548! [2024-12-23 14:41:41,763][02383] Updated weights for policy 0, policy_version 410 (0.0024) [2024-12-23 14:41:42,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1679360. Throughput: 0: 951.1. Samples: 420740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:41:42,130][00245] Avg episode reward: [(0, '7.033')] [2024-12-23 14:41:42,139][02370] Saving new best policy, reward=7.033! [2024-12-23 14:41:47,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1699840. Throughput: 0: 982.9. Samples: 424272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:41:47,133][00245] Avg episode reward: [(0, '7.437')] [2024-12-23 14:41:47,138][02370] Saving new best policy, reward=7.437! [2024-12-23 14:41:52,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1716224. Throughput: 0: 979.6. Samples: 429560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:41:52,130][00245] Avg episode reward: [(0, '6.965')] [2024-12-23 14:41:52,510][02383] Updated weights for policy 0, policy_version 420 (0.0020) [2024-12-23 14:41:57,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3832.2). Total num frames: 1736704. Throughput: 0: 939.7. Samples: 434802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:41:57,131][00245] Avg episode reward: [(0, '7.208')] [2024-12-23 14:42:02,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1757184. Throughput: 0: 957.4. Samples: 438270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:42:02,128][00245] Avg episode reward: [(0, '7.648')] [2024-12-23 14:42:02,138][02370] Saving new best policy, reward=7.648! [2024-12-23 14:42:02,372][02383] Updated weights for policy 0, policy_version 430 (0.0020) [2024-12-23 14:42:07,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1777664. Throughput: 0: 1005.3. Samples: 444772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:42:07,129][00245] Avg episode reward: [(0, '8.256')] [2024-12-23 14:42:07,138][02370] Saving new best policy, reward=8.256! [2024-12-23 14:42:12,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1789952. Throughput: 0: 942.2. Samples: 448938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:42:12,131][00245] Avg episode reward: [(0, '8.953')] [2024-12-23 14:42:12,140][02370] Saving new best policy, reward=8.953! [2024-12-23 14:42:14,070][02383] Updated weights for policy 0, policy_version 440 (0.0027) [2024-12-23 14:42:17,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 1814528. Throughput: 0: 939.4. Samples: 452370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:42:17,132][00245] Avg episode reward: [(0, '9.783')] [2024-12-23 14:42:17,138][02370] Saving new best policy, reward=9.783! [2024-12-23 14:42:22,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1839104. Throughput: 0: 992.1. Samples: 459240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:42:22,128][00245] Avg episode reward: [(0, '11.045')] [2024-12-23 14:42:22,137][02370] Saving new best policy, reward=11.045! [2024-12-23 14:42:23,585][02383] Updated weights for policy 0, policy_version 450 (0.0028) [2024-12-23 14:42:27,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 1851392. Throughput: 0: 953.6. Samples: 463654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:42:27,132][00245] Avg episode reward: [(0, '11.018')] [2024-12-23 14:42:32,126][00245] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1871872. Throughput: 0: 933.2. Samples: 466264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:42:32,135][00245] Avg episode reward: [(0, '11.165')] [2024-12-23 14:42:32,149][02370] Saving new best policy, reward=11.165! [2024-12-23 14:42:34,460][02383] Updated weights for policy 0, policy_version 460 (0.0023) [2024-12-23 14:42:37,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1892352. Throughput: 0: 974.6. Samples: 473416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:42:37,128][00245] Avg episode reward: [(0, '11.091')] [2024-12-23 14:42:42,126][00245] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1912832. Throughput: 0: 991.4. Samples: 479416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:42:42,131][00245] Avg episode reward: [(0, '10.452')] [2024-12-23 14:42:45,128][02383] Updated weights for policy 0, policy_version 470 (0.0048) [2024-12-23 14:42:47,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1929216. Throughput: 0: 964.2. Samples: 481660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:42:47,132][00245] Avg episode reward: [(0, '10.532')] [2024-12-23 14:42:52,126][00245] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1953792. Throughput: 0: 969.4. Samples: 488394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:42:52,132][00245] Avg episode reward: [(0, '10.567')] [2024-12-23 14:42:53,957][02383] Updated weights for policy 0, policy_version 480 (0.0016) [2024-12-23 14:42:57,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1978368. Throughput: 0: 1029.0. Samples: 495242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 14:42:57,132][00245] Avg episode reward: [(0, '10.718')] [2024-12-23 14:43:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1990656. Throughput: 0: 1000.2. Samples: 497378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:43:02,128][00245] Avg episode reward: [(0, '9.941')] [2024-12-23 14:43:05,191][02383] Updated weights for policy 0, policy_version 490 (0.0036) [2024-12-23 14:43:07,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2015232. Throughput: 0: 983.5. Samples: 503496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:43:07,134][00245] Avg episode reward: [(0, '11.491')] [2024-12-23 14:43:07,136][02370] Saving new best policy, reward=11.491! [2024-12-23 14:43:12,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3887.7). Total num frames: 2039808. Throughput: 0: 1044.8. Samples: 510670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:43:12,132][00245] Avg episode reward: [(0, '12.119')] [2024-12-23 14:43:12,146][02370] Saving new best policy, reward=12.119! [2024-12-23 14:43:14,048][02383] Updated weights for policy 0, policy_version 500 (0.0031) [2024-12-23 14:43:17,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2056192. Throughput: 0: 1042.9. Samples: 513194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:43:17,128][00245] Avg episode reward: [(0, '13.509')] [2024-12-23 14:43:17,130][02370] Saving new best policy, reward=13.509! [2024-12-23 14:43:22,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2072576. Throughput: 0: 991.1. Samples: 518014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:43:22,134][00245] Avg episode reward: [(0, '13.474')] [2024-12-23 14:43:24,974][02383] Updated weights for policy 0, policy_version 510 (0.0016) [2024-12-23 14:43:27,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2097152. Throughput: 0: 1020.4. Samples: 525332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:43:27,127][00245] Avg episode reward: [(0, '13.995')] [2024-12-23 14:43:27,133][02370] Saving new best policy, reward=13.995! [2024-12-23 14:43:32,126][00245] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2117632. Throughput: 0: 1049.5. Samples: 528890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:43:32,133][00245] Avg episode reward: [(0, '14.373')] [2024-12-23 14:43:32,146][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth... [2024-12-23 14:43:32,299][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000287_1175552.pth [2024-12-23 14:43:32,319][02370] Saving new best policy, reward=14.373! [2024-12-23 14:43:35,779][02383] Updated weights for policy 0, policy_version 520 (0.0033) [2024-12-23 14:43:37,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2134016. Throughput: 0: 994.6. Samples: 533150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:43:37,128][00245] Avg episode reward: [(0, '13.480')] [2024-12-23 14:43:42,126][00245] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 2158592. Throughput: 0: 998.1. Samples: 540156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:43:42,133][00245] Avg episode reward: [(0, '13.632')] [2024-12-23 14:43:44,580][02383] Updated weights for policy 0, policy_version 530 (0.0019) [2024-12-23 14:43:47,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3915.5). Total num frames: 2179072. Throughput: 0: 1033.4. Samples: 543880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:43:47,128][00245] Avg episode reward: [(0, '13.964')] [2024-12-23 14:43:52,127][00245] Fps is (10 sec: 3686.0, 60 sec: 4027.6, 300 sec: 3901.6). Total num frames: 2195456. Throughput: 0: 1009.6. Samples: 548930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:43:52,131][00245] Avg episode reward: [(0, '13.510')] [2024-12-23 14:43:55,848][02383] Updated weights for policy 0, policy_version 540 (0.0023) [2024-12-23 14:43:57,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2215936. Throughput: 0: 985.7. Samples: 555026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:43:57,132][00245] Avg episode reward: [(0, '13.957')] [2024-12-23 14:44:02,126][00245] Fps is (10 sec: 4506.2, 60 sec: 4164.3, 300 sec: 3943.3). Total num frames: 2240512. Throughput: 0: 1009.4. Samples: 558616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:44:02,128][00245] Avg episode reward: [(0, '13.362')] [2024-12-23 14:44:04,570][02383] Updated weights for policy 0, policy_version 550 (0.0028) [2024-12-23 14:44:07,134][00245] Fps is (10 sec: 4092.6, 60 sec: 4027.2, 300 sec: 3915.4). Total num frames: 2256896. Throughput: 0: 1041.7. Samples: 564898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:44:07,141][00245] Avg episode reward: [(0, '13.937')] [2024-12-23 14:44:12,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2277376. Throughput: 0: 992.6. Samples: 569998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:44:12,129][00245] Avg episode reward: [(0, '14.879')] [2024-12-23 14:44:12,138][02370] Saving new best policy, reward=14.879! [2024-12-23 14:44:15,302][02383] Updated weights for policy 0, policy_version 560 (0.0043) [2024-12-23 14:44:17,126][00245] Fps is (10 sec: 4509.3, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2301952. Throughput: 0: 995.7. Samples: 573698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:44:17,131][00245] Avg episode reward: [(0, '14.440')] [2024-12-23 14:44:22,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 2322432. Throughput: 0: 1054.4. Samples: 580596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:44:22,133][00245] Avg episode reward: [(0, '14.186')] [2024-12-23 14:44:25,806][02383] Updated weights for policy 0, policy_version 570 (0.0016) [2024-12-23 14:44:27,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2334720. Throughput: 0: 999.6. Samples: 585138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:44:27,137][00245] Avg episode reward: [(0, '13.834')] [2024-12-23 14:44:32,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 2359296. Throughput: 0: 992.6. Samples: 588548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:44:32,129][00245] Avg episode reward: [(0, '13.883')] [2024-12-23 14:44:34,823][02383] Updated weights for policy 0, policy_version 580 (0.0031) [2024-12-23 14:44:37,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 2383872. Throughput: 0: 1043.6. Samples: 595890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:44:37,129][00245] Avg episode reward: [(0, '13.892')] [2024-12-23 14:44:42,130][00245] Fps is (10 sec: 4094.3, 60 sec: 4027.5, 300 sec: 3957.1). Total num frames: 2400256. Throughput: 0: 1023.3. Samples: 601078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 14:44:42,132][00245] Avg episode reward: [(0, '14.615')] [2024-12-23 14:44:46,152][02383] Updated weights for policy 0, policy_version 590 (0.0032) [2024-12-23 14:44:47,127][00245] Fps is (10 sec: 3685.9, 60 sec: 4027.6, 300 sec: 3957.1). Total num frames: 2420736. Throughput: 0: 999.9. Samples: 603612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:44:47,130][00245] Avg episode reward: [(0, '15.202')] [2024-12-23 14:44:47,136][02370] Saving new best policy, reward=15.202! [2024-12-23 14:44:52,126][00245] Fps is (10 sec: 4097.7, 60 sec: 4096.1, 300 sec: 3971.0). Total num frames: 2441216. Throughput: 0: 1015.7. Samples: 610596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:44:52,129][00245] Avg episode reward: [(0, '16.532')] [2024-12-23 14:44:52,230][02370] Saving new best policy, reward=16.532! [2024-12-23 14:44:54,968][02383] Updated weights for policy 0, policy_version 600 (0.0016) [2024-12-23 14:44:57,126][00245] Fps is (10 sec: 4096.5, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2461696. Throughput: 0: 1034.5. Samples: 616550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:44:57,128][00245] Avg episode reward: [(0, '17.129')] [2024-12-23 14:44:57,132][02370] Saving new best policy, reward=17.129! [2024-12-23 14:45:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2478080. Throughput: 0: 1000.7. Samples: 618728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:45:02,131][00245] Avg episode reward: [(0, '16.427')] [2024-12-23 14:45:05,830][02383] Updated weights for policy 0, policy_version 610 (0.0031) [2024-12-23 14:45:07,126][00245] Fps is (10 sec: 4096.1, 60 sec: 4096.6, 300 sec: 3984.9). Total num frames: 2502656. Throughput: 0: 994.2. Samples: 625336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:45:07,131][00245] Avg episode reward: [(0, '16.710')] [2024-12-23 14:45:12,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2527232. Throughput: 0: 1050.5. Samples: 632410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:45:12,136][00245] Avg episode reward: [(0, '15.498')] [2024-12-23 14:45:16,182][02383] Updated weights for policy 0, policy_version 620 (0.0029) [2024-12-23 14:45:17,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2539520. Throughput: 0: 1024.5. Samples: 634650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:45:17,136][00245] Avg episode reward: [(0, '15.754')] [2024-12-23 14:45:22,126][00245] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 2560000. Throughput: 0: 982.6. Samples: 640108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:45:22,128][00245] Avg episode reward: [(0, '15.858')] [2024-12-23 14:45:25,486][02383] Updated weights for policy 0, policy_version 630 (0.0017) [2024-12-23 14:45:27,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2584576. Throughput: 0: 1032.4. Samples: 647532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:45:27,128][00245] Avg episode reward: [(0, '15.710')] [2024-12-23 14:45:32,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2605056. Throughput: 0: 1045.6. Samples: 650662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:45:32,128][00245] Avg episode reward: [(0, '16.886')] [2024-12-23 14:45:32,140][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000636_2605056.pth... [2024-12-23 14:45:32,358][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000402_1646592.pth [2024-12-23 14:45:36,619][02383] Updated weights for policy 0, policy_version 640 (0.0026) [2024-12-23 14:45:37,126][00245] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2621440. Throughput: 0: 990.8. Samples: 655182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:45:37,129][00245] Avg episode reward: [(0, '17.293')] [2024-12-23 14:45:37,132][02370] Saving new best policy, reward=17.293! [2024-12-23 14:45:42,126][00245] Fps is (10 sec: 4096.1, 60 sec: 4096.3, 300 sec: 4012.7). Total num frames: 2646016. Throughput: 0: 1018.5. Samples: 662384. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:45:42,128][00245] Avg episode reward: [(0, '18.343')] [2024-12-23 14:45:42,144][02370] Saving new best policy, reward=18.343! [2024-12-23 14:45:45,370][02383] Updated weights for policy 0, policy_version 650 (0.0019) [2024-12-23 14:45:47,126][00245] Fps is (10 sec: 4505.7, 60 sec: 4096.1, 300 sec: 3998.8). Total num frames: 2666496. Throughput: 0: 1049.1. Samples: 665936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 14:45:47,131][00245] Avg episode reward: [(0, '19.027')] [2024-12-23 14:45:47,134][02370] Saving new best policy, reward=19.027! [2024-12-23 14:45:52,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3985.0). Total num frames: 2682880. Throughput: 0: 1006.5. Samples: 670628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:45:52,132][00245] Avg episode reward: [(0, '17.966')] [2024-12-23 14:45:56,632][02383] Updated weights for policy 0, policy_version 660 (0.0019) [2024-12-23 14:45:57,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 2703360. Throughput: 0: 991.8. Samples: 677040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:45:57,131][00245] Avg episode reward: [(0, '16.658')] [2024-12-23 14:46:02,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 2727936. Throughput: 0: 1022.3. Samples: 680652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:46:02,131][00245] Avg episode reward: [(0, '15.142')] [2024-12-23 14:46:06,268][02383] Updated weights for policy 0, policy_version 670 (0.0027) [2024-12-23 14:46:07,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2744320. Throughput: 0: 1032.4. Samples: 686564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:46:07,130][00245] Avg episode reward: [(0, '14.901')] [2024-12-23 14:46:12,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2764800. Throughput: 0: 987.9. Samples: 691988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:46:12,133][00245] Avg episode reward: [(0, '14.554')] [2024-12-23 14:46:16,080][02383] Updated weights for policy 0, policy_version 680 (0.0025) [2024-12-23 14:46:17,126][00245] Fps is (10 sec: 4505.5, 60 sec: 4164.2, 300 sec: 4040.5). Total num frames: 2789376. Throughput: 0: 1001.2. Samples: 695716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:46:17,130][00245] Avg episode reward: [(0, '15.467')] [2024-12-23 14:46:22,128][00245] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4026.6). Total num frames: 2809856. Throughput: 0: 1045.9. Samples: 702252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:46:22,130][00245] Avg episode reward: [(0, '15.774')] [2024-12-23 14:46:27,126][00245] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2822144. Throughput: 0: 988.7. Samples: 706874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:46:27,128][00245] Avg episode reward: [(0, '16.626')] [2024-12-23 14:46:27,313][02383] Updated weights for policy 0, policy_version 690 (0.0051) [2024-12-23 14:46:32,126][00245] Fps is (10 sec: 3687.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2846720. Throughput: 0: 990.0. Samples: 710486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:46:32,128][00245] Avg episode reward: [(0, '17.112')] [2024-12-23 14:46:35,721][02383] Updated weights for policy 0, policy_version 700 (0.0024) [2024-12-23 14:46:37,128][00245] Fps is (10 sec: 4914.2, 60 sec: 4164.1, 300 sec: 4040.4). Total num frames: 2871296. Throughput: 0: 1049.7. Samples: 717868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:46:37,135][00245] Avg episode reward: [(0, '18.026')] [2024-12-23 14:46:42,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2887680. Throughput: 0: 1012.7. Samples: 722612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:46:42,132][00245] Avg episode reward: [(0, '17.629')] [2024-12-23 14:46:46,890][02383] Updated weights for policy 0, policy_version 710 (0.0028) [2024-12-23 14:46:47,126][00245] Fps is (10 sec: 3687.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2908160. Throughput: 0: 994.2. Samples: 725392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:46:47,131][00245] Avg episode reward: [(0, '18.403')] [2024-12-23 14:46:52,126][00245] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 4054.3). Total num frames: 2932736. Throughput: 0: 1023.4. Samples: 732618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:46:52,134][00245] Avg episode reward: [(0, '18.517')] [2024-12-23 14:46:56,111][02383] Updated weights for policy 0, policy_version 720 (0.0027) [2024-12-23 14:46:57,127][00245] Fps is (10 sec: 4095.5, 60 sec: 4095.9, 300 sec: 4040.4). Total num frames: 2949120. Throughput: 0: 1030.6. Samples: 738368. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-23 14:46:57,134][00245] Avg episode reward: [(0, '18.924')] [2024-12-23 14:47:02,126][00245] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2965504. Throughput: 0: 996.2. Samples: 740544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:47:02,128][00245] Avg episode reward: [(0, '18.925')] [2024-12-23 14:47:06,497][02383] Updated weights for policy 0, policy_version 730 (0.0020) [2024-12-23 14:47:07,126][00245] Fps is (10 sec: 4096.5, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2990080. Throughput: 0: 1006.6. Samples: 747546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:47:07,131][00245] Avg episode reward: [(0, '19.794')] [2024-12-23 14:47:07,133][02370] Saving new best policy, reward=19.794! [2024-12-23 14:47:12,129][00245] Fps is (10 sec: 4913.7, 60 sec: 4164.0, 300 sec: 4068.2). Total num frames: 3014656. Throughput: 0: 1052.2. Samples: 754226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:47:12,131][00245] Avg episode reward: [(0, '19.347')] [2024-12-23 14:47:17,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3026944. Throughput: 0: 1019.7. Samples: 756372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:47:17,130][00245] Avg episode reward: [(0, '20.066')] [2024-12-23 14:47:17,132][02370] Saving new best policy, reward=20.066! [2024-12-23 14:47:17,705][02383] Updated weights for policy 0, policy_version 740 (0.0019) [2024-12-23 14:47:22,126][00245] Fps is (10 sec: 3277.8, 60 sec: 3959.6, 300 sec: 4054.3). Total num frames: 3047424. Throughput: 0: 982.5. Samples: 762078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:47:22,127][00245] Avg episode reward: [(0, '19.739')] [2024-12-23 14:47:26,373][02383] Updated weights for policy 0, policy_version 750 (0.0021) [2024-12-23 14:47:27,126][00245] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3072000. Throughput: 0: 1041.4. Samples: 769474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:47:27,127][00245] Avg episode reward: [(0, '19.331')] [2024-12-23 14:47:32,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3092480. Throughput: 0: 1040.1. Samples: 772196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:47:32,128][00245] Avg episode reward: [(0, '19.458')] [2024-12-23 14:47:32,137][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000755_3092480.pth... [2024-12-23 14:47:32,280][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth [2024-12-23 14:47:37,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4054.3). Total num frames: 3108864. Throughput: 0: 988.1. Samples: 777080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:47:37,132][00245] Avg episode reward: [(0, '21.822')] [2024-12-23 14:47:37,136][02370] Saving new best policy, reward=21.822! [2024-12-23 14:47:37,642][02383] Updated weights for policy 0, policy_version 760 (0.0026) [2024-12-23 14:47:42,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3133440. Throughput: 0: 1020.9. Samples: 784306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:47:42,128][00245] Avg episode reward: [(0, '21.035')] [2024-12-23 14:47:46,596][02383] Updated weights for policy 0, policy_version 770 (0.0016) [2024-12-23 14:47:47,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3153920. Throughput: 0: 1053.5. Samples: 787950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:47:47,128][00245] Avg episode reward: [(0, '21.803')] [2024-12-23 14:47:52,130][00245] Fps is (10 sec: 3275.5, 60 sec: 3891.0, 300 sec: 4026.5). Total num frames: 3166208. Throughput: 0: 991.6. Samples: 792172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:47:52,135][00245] Avg episode reward: [(0, '21.287')] [2024-12-23 14:47:57,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 3190784. Throughput: 0: 979.0. Samples: 798276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:47:57,131][00245] Avg episode reward: [(0, '20.112')] [2024-12-23 14:47:57,896][02383] Updated weights for policy 0, policy_version 780 (0.0033) [2024-12-23 14:48:02,126][00245] Fps is (10 sec: 4917.2, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3215360. Throughput: 0: 1013.7. Samples: 801988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:48:02,128][00245] Avg episode reward: [(0, '21.279')] [2024-12-23 14:48:07,127][00245] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 3227648. Throughput: 0: 1007.2. Samples: 807404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:48:07,129][00245] Avg episode reward: [(0, '20.605')] [2024-12-23 14:48:08,775][02383] Updated weights for policy 0, policy_version 790 (0.0032) [2024-12-23 14:48:12,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 4040.5). Total num frames: 3248128. Throughput: 0: 974.6. Samples: 813332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:48:12,128][00245] Avg episode reward: [(0, '20.123')] [2024-12-23 14:48:17,126][00245] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3272704. Throughput: 0: 994.3. Samples: 816940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:48:17,132][00245] Avg episode reward: [(0, '19.984')] [2024-12-23 14:48:17,341][02383] Updated weights for policy 0, policy_version 800 (0.0018) [2024-12-23 14:48:22,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3293184. Throughput: 0: 1028.4. Samples: 823358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:48:22,131][00245] Avg episode reward: [(0, '19.825')] [2024-12-23 14:48:27,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3309568. Throughput: 0: 974.8. Samples: 828174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:48:27,128][00245] Avg episode reward: [(0, '18.021')] [2024-12-23 14:48:28,699][02383] Updated weights for policy 0, policy_version 810 (0.0043) [2024-12-23 14:48:32,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3334144. Throughput: 0: 976.4. Samples: 831886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:48:32,128][00245] Avg episode reward: [(0, '19.138')] [2024-12-23 14:48:37,128][00245] Fps is (10 sec: 4504.6, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 3354624. Throughput: 0: 1041.5. Samples: 839036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:48:37,131][00245] Avg episode reward: [(0, '19.657')] [2024-12-23 14:48:37,573][02383] Updated weights for policy 0, policy_version 820 (0.0024) [2024-12-23 14:48:42,130][00245] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 4040.4). Total num frames: 3371008. Throughput: 0: 1005.0. Samples: 843504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:48:42,132][00245] Avg episode reward: [(0, '19.913')] [2024-12-23 14:48:47,126][00245] Fps is (10 sec: 3687.2, 60 sec: 3959.5, 300 sec: 4054.4). Total num frames: 3391488. Throughput: 0: 993.6. Samples: 846700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:48:47,132][00245] Avg episode reward: [(0, '20.956')] [2024-12-23 14:48:48,125][02383] Updated weights for policy 0, policy_version 830 (0.0023) [2024-12-23 14:48:52,126][00245] Fps is (10 sec: 4507.5, 60 sec: 4164.5, 300 sec: 4068.2). Total num frames: 3416064. Throughput: 0: 1034.1. Samples: 853936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:48:52,131][00245] Avg episode reward: [(0, '21.299')] [2024-12-23 14:48:57,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3432448. Throughput: 0: 1025.5. Samples: 859478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:48:57,130][00245] Avg episode reward: [(0, '20.222')] [2024-12-23 14:48:58,588][02383] Updated weights for policy 0, policy_version 840 (0.0017) [2024-12-23 14:49:02,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.5). Total num frames: 3452928. Throughput: 0: 994.8. Samples: 861704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:49:02,128][00245] Avg episode reward: [(0, '19.177')] [2024-12-23 14:49:07,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3477504. Throughput: 0: 1012.0. Samples: 868896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:49:07,128][00245] Avg episode reward: [(0, '18.365')] [2024-12-23 14:49:07,654][02383] Updated weights for policy 0, policy_version 850 (0.0030) [2024-12-23 14:49:12,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3497984. Throughput: 0: 1048.8. Samples: 875368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:49:12,131][00245] Avg episode reward: [(0, '18.235')] [2024-12-23 14:49:17,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3510272. Throughput: 0: 1015.4. Samples: 877580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 14:49:17,133][00245] Avg episode reward: [(0, '19.195')] [2024-12-23 14:49:18,850][02383] Updated weights for policy 0, policy_version 860 (0.0027) [2024-12-23 14:49:22,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3534848. Throughput: 0: 996.8. Samples: 883888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:49:22,128][00245] Avg episode reward: [(0, '20.683')] [2024-12-23 14:49:27,128][00245] Fps is (10 sec: 4913.9, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 3559424. Throughput: 0: 1061.1. Samples: 891250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:49:27,131][00245] Avg episode reward: [(0, '21.059')] [2024-12-23 14:49:27,466][02383] Updated weights for policy 0, policy_version 870 (0.0035) [2024-12-23 14:49:32,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3575808. Throughput: 0: 1038.5. Samples: 893434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:49:32,130][00245] Avg episode reward: [(0, '21.528')] [2024-12-23 14:49:32,140][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000873_3575808.pth... [2024-12-23 14:49:32,310][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000636_2605056.pth [2024-12-23 14:49:37,126][00245] Fps is (10 sec: 3687.4, 60 sec: 4027.9, 300 sec: 4054.4). Total num frames: 3596288. Throughput: 0: 998.0. Samples: 898844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:49:37,128][00245] Avg episode reward: [(0, '21.872')] [2024-12-23 14:49:37,135][02370] Saving new best policy, reward=21.872! [2024-12-23 14:49:38,425][02383] Updated weights for policy 0, policy_version 880 (0.0017) [2024-12-23 14:49:42,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4164.6, 300 sec: 4068.3). Total num frames: 3620864. Throughput: 0: 1036.7. Samples: 906130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:49:42,128][00245] Avg episode reward: [(0, '21.467')] [2024-12-23 14:49:47,129][00245] Fps is (10 sec: 4094.7, 60 sec: 4095.8, 300 sec: 4054.3). Total num frames: 3637248. Throughput: 0: 1058.8. Samples: 909354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:49:47,131][00245] Avg episode reward: [(0, '21.460')] [2024-12-23 14:49:48,547][02383] Updated weights for policy 0, policy_version 890 (0.0036) [2024-12-23 14:49:52,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3653632. Throughput: 0: 997.9. Samples: 913802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 14:49:52,130][00245] Avg episode reward: [(0, '21.763')] [2024-12-23 14:49:57,126][00245] Fps is (10 sec: 4097.3, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3678208. Throughput: 0: 1012.7. Samples: 920940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:49:57,131][00245] Avg episode reward: [(0, '20.771')] [2024-12-23 14:49:57,988][02383] Updated weights for policy 0, policy_version 900 (0.0020) [2024-12-23 14:50:02,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3702784. Throughput: 0: 1044.3. Samples: 924574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:50:02,128][00245] Avg episode reward: [(0, '19.954')] [2024-12-23 14:50:07,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3719168. Throughput: 0: 1012.9. Samples: 929468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:50:07,130][00245] Avg episode reward: [(0, '20.478')] [2024-12-23 14:50:09,261][02383] Updated weights for policy 0, policy_version 910 (0.0042) [2024-12-23 14:50:12,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3739648. Throughput: 0: 990.8. Samples: 935834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:50:12,128][00245] Avg episode reward: [(0, '19.333')] [2024-12-23 14:50:17,126][00245] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 3764224. Throughput: 0: 1023.9. Samples: 939508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:50:17,131][00245] Avg episode reward: [(0, '19.769')] [2024-12-23 14:50:17,475][02383] Updated weights for policy 0, policy_version 920 (0.0025) [2024-12-23 14:50:22,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3780608. Throughput: 0: 1039.0. Samples: 945598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 14:50:22,131][00245] Avg episode reward: [(0, '20.390')] [2024-12-23 14:50:27,126][00245] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 4040.5). Total num frames: 3796992. Throughput: 0: 993.0. Samples: 950816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:50:27,133][00245] Avg episode reward: [(0, '21.325')] [2024-12-23 14:50:28,791][02383] Updated weights for policy 0, policy_version 930 (0.0038) [2024-12-23 14:50:32,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3821568. Throughput: 0: 1003.0. Samples: 954488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:50:32,133][00245] Avg episode reward: [(0, '21.494')] [2024-12-23 14:50:37,126][00245] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3846144. Throughput: 0: 1062.7. Samples: 961624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:50:37,134][00245] Avg episode reward: [(0, '21.353')] [2024-12-23 14:50:37,986][02383] Updated weights for policy 0, policy_version 940 (0.0032) [2024-12-23 14:50:42,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3858432. Throughput: 0: 1003.4. Samples: 966094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:50:42,128][00245] Avg episode reward: [(0, '20.739')] [2024-12-23 14:50:47,126][00245] Fps is (10 sec: 3686.4, 60 sec: 4096.2, 300 sec: 4068.2). Total num frames: 3883008. Throughput: 0: 997.2. Samples: 969450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 14:50:47,130][00245] Avg episode reward: [(0, '21.777')] [2024-12-23 14:50:48,350][02383] Updated weights for policy 0, policy_version 950 (0.0013) [2024-12-23 14:50:52,126][00245] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 3907584. Throughput: 0: 1049.9. Samples: 976716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:50:52,128][00245] Avg episode reward: [(0, '20.982')] [2024-12-23 14:50:57,130][00245] Fps is (10 sec: 4094.1, 60 sec: 4095.7, 300 sec: 4054.3). Total num frames: 3923968. Throughput: 0: 1020.5. Samples: 981762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 14:50:57,133][00245] Avg episode reward: [(0, '20.546')] [2024-12-23 14:50:59,458][02383] Updated weights for policy 0, policy_version 960 (0.0030) [2024-12-23 14:51:02,126][00245] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3944448. Throughput: 0: 994.7. Samples: 984268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:51:02,128][00245] Avg episode reward: [(0, '20.773')] [2024-12-23 14:51:07,126][00245] Fps is (10 sec: 4097.9, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3964928. Throughput: 0: 1023.1. Samples: 991638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 14:51:07,128][00245] Avg episode reward: [(0, '19.771')] [2024-12-23 14:51:07,926][02383] Updated weights for policy 0, policy_version 970 (0.0032) [2024-12-23 14:51:12,126][00245] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 3985408. Throughput: 0: 1042.6. Samples: 997734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 14:51:12,128][00245] Avg episode reward: [(0, '20.145')] [2024-12-23 14:51:17,126][00245] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 4001792. Throughput: 0: 1009.8. Samples: 999928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 14:51:17,128][00245] Avg episode reward: [(0, '20.846')] [2024-12-23 14:51:17,422][00245] Component Batcher_0 stopped! [2024-12-23 14:51:17,422][02370] Stopping Batcher_0... [2024-12-23 14:51:17,427][02370] Loop batcher_evt_loop terminating... [2024-12-23 14:51:17,429][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 14:51:17,480][02383] Weights refcount: 2 0 [2024-12-23 14:51:17,483][00245] Component InferenceWorker_p0-w0 stopped! [2024-12-23 14:51:17,486][02383] Stopping InferenceWorker_p0-w0... [2024-12-23 14:51:17,489][02383] Loop inference_proc0-0_evt_loop terminating... [2024-12-23 14:51:17,564][02370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000755_3092480.pth [2024-12-23 14:51:17,583][02370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 14:51:17,784][00245] Component LearnerWorker_p0 stopped! [2024-12-23 14:51:17,788][02385] Stopping RolloutWorker_w1... [2024-12-23 14:51:17,789][02385] Loop rollout_proc1_evt_loop terminating... [2024-12-23 14:51:17,790][00245] Component RolloutWorker_w1 stopped! [2024-12-23 14:51:17,786][02370] Stopping LearnerWorker_p0... [2024-12-23 14:51:17,795][02370] Loop learner_proc0_evt_loop terminating... [2024-12-23 14:51:17,812][02387] Stopping RolloutWorker_w3... [2024-12-23 14:51:17,812][00245] Component RolloutWorker_w3 stopped! [2024-12-23 14:51:17,812][02387] Loop rollout_proc3_evt_loop terminating... [2024-12-23 14:51:17,824][02388] Stopping RolloutWorker_w5... [2024-12-23 14:51:17,825][02388] Loop rollout_proc5_evt_loop terminating... [2024-12-23 14:51:17,824][00245] Component RolloutWorker_w5 stopped! [2024-12-23 14:51:17,871][00245] Component RolloutWorker_w6 stopped! [2024-12-23 14:51:17,873][02390] Stopping RolloutWorker_w6... [2024-12-23 14:51:17,875][02390] Loop rollout_proc6_evt_loop terminating... [2024-12-23 14:51:17,902][02391] Stopping RolloutWorker_w7... [2024-12-23 14:51:17,903][02391] Loop rollout_proc7_evt_loop terminating... [2024-12-23 14:51:17,903][00245] Component RolloutWorker_w7 stopped! [2024-12-23 14:51:17,906][00245] Component RolloutWorker_w2 stopped! [2024-12-23 14:51:17,909][02386] Stopping RolloutWorker_w2... [2024-12-23 14:51:17,913][02386] Loop rollout_proc2_evt_loop terminating... [2024-12-23 14:51:17,932][00245] Component RolloutWorker_w0 stopped! [2024-12-23 14:51:17,934][02384] Stopping RolloutWorker_w0... [2024-12-23 14:51:17,936][02384] Loop rollout_proc0_evt_loop terminating... [2024-12-23 14:51:17,973][00245] Component RolloutWorker_w4 stopped! [2024-12-23 14:51:17,975][00245] Waiting for process learner_proc0 to stop... [2024-12-23 14:51:17,978][02389] Stopping RolloutWorker_w4... [2024-12-23 14:51:17,979][02389] Loop rollout_proc4_evt_loop terminating... [2024-12-23 14:51:19,459][00245] Waiting for process inference_proc0-0 to join... [2024-12-23 14:51:19,466][00245] Waiting for process rollout_proc0 to join... [2024-12-23 14:51:21,502][00245] Waiting for process rollout_proc1 to join... [2024-12-23 14:51:21,509][00245] Waiting for process rollout_proc2 to join... [2024-12-23 14:51:21,513][00245] Waiting for process rollout_proc3 to join... [2024-12-23 14:51:21,517][00245] Waiting for process rollout_proc4 to join... [2024-12-23 14:51:21,521][00245] Waiting for process rollout_proc5 to join... [2024-12-23 14:51:21,524][00245] Waiting for process rollout_proc6 to join... [2024-12-23 14:51:21,528][00245] Waiting for process rollout_proc7 to join... [2024-12-23 14:51:21,531][00245] Batcher 0 profile tree view: batching: 26.5475, releasing_batches: 0.0277 [2024-12-23 14:51:21,533][00245] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 408.5340 update_model: 8.5356 weight_update: 0.0030 one_step: 0.0068 handle_policy_step: 569.6260 deserialize: 14.1302, stack: 3.0953, obs_to_device_normalize: 121.7903, forward: 285.3483, send_messages: 29.1042 prepare_outputs: 87.2346 to_cpu: 52.9357 [2024-12-23 14:51:21,536][00245] Learner 0 profile tree view: misc: 0.0064, prepare_batch: 13.2114 train: 72.9652 epoch_init: 0.0056, minibatch_init: 0.0063, losses_postprocess: 0.6679, kl_divergence: 0.7229, after_optimizer: 33.6001 calculate_losses: 25.6338 losses_init: 0.0039, forward_head: 1.2713, bptt_initial: 17.1416, tail: 1.0440, advantages_returns: 0.2800, losses: 3.6246 bptt: 1.9546 bptt_forward_core: 1.8748 update: 11.6017 clip: 0.8743 [2024-12-23 14:51:21,538][00245] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2716, enqueue_policy_requests: 96.6778, env_step: 806.8398, overhead: 12.7852, complete_rollouts: 7.1490 save_policy_outputs: 20.2967 split_output_tensors: 7.9444 [2024-12-23 14:51:21,539][00245] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3208, enqueue_policy_requests: 97.5635, env_step: 800.5676, overhead: 13.0717, complete_rollouts: 7.2784 save_policy_outputs: 21.1770 split_output_tensors: 8.4067 [2024-12-23 14:51:21,540][00245] Loop Runner_EvtLoop terminating... [2024-12-23 14:51:21,542][00245] Runner profile tree view: main_loop: 1060.3480 [2024-12-23 14:51:21,543][00245] Collected {0: 4005888}, FPS: 3777.9 [2024-12-23 14:51:21,598][00245] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-23 14:51:21,600][00245] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-23 14:51:21,603][00245] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-23 14:51:21,604][00245] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-23 14:51:21,607][00245] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-23 14:51:21,610][00245] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-23 14:51:21,611][00245] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-23 14:51:21,612][00245] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-23 14:51:21,613][00245] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-23 14:51:21,614][00245] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-23 14:51:21,615][00245] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-23 14:51:21,616][00245] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-23 14:51:21,617][00245] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-23 14:51:21,618][00245] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-23 14:51:21,619][00245] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-23 14:51:21,653][00245] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 14:51:21,657][00245] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 14:51:21,659][00245] RunningMeanStd input shape: (1,) [2024-12-23 14:51:21,675][00245] ConvEncoder: input_channels=3 [2024-12-23 14:51:21,780][00245] Conv encoder output size: 512 [2024-12-23 14:51:21,782][00245] Policy head output size: 512 [2024-12-23 14:51:21,955][00245] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 14:51:22,749][00245] Num frames 100... [2024-12-23 14:51:22,871][00245] Num frames 200... [2024-12-23 14:51:22,991][00245] Num frames 300... [2024-12-23 14:51:23,121][00245] Num frames 400... [2024-12-23 14:51:23,271][00245] Avg episode rewards: #0: 7.800, true rewards: #0: 4.800 [2024-12-23 14:51:23,272][00245] Avg episode reward: 7.800, avg true_objective: 4.800 [2024-12-23 14:51:23,298][00245] Num frames 500... [2024-12-23 14:51:23,416][00245] Num frames 600... [2024-12-23 14:51:23,538][00245] Num frames 700... [2024-12-23 14:51:23,666][00245] Num frames 800... [2024-12-23 14:51:23,783][00245] Num frames 900... [2024-12-23 14:51:23,903][00245] Num frames 1000... [2024-12-23 14:51:24,022][00245] Num frames 1100... [2024-12-23 14:51:24,152][00245] Num frames 1200... [2024-12-23 14:51:24,274][00245] Num frames 1300... [2024-12-23 14:51:24,391][00245] Num frames 1400... [2024-12-23 14:51:24,510][00245] Num frames 1500... [2024-12-23 14:51:24,634][00245] Num frames 1600... [2024-12-23 14:51:24,757][00245] Num frames 1700... [2024-12-23 14:51:24,930][00245] Avg episode rewards: #0: 16.960, true rewards: #0: 8.960 [2024-12-23 14:51:24,932][00245] Avg episode reward: 16.960, avg true_objective: 8.960 [2024-12-23 14:51:24,945][00245] Num frames 1800... [2024-12-23 14:51:25,066][00245] Num frames 1900... [2024-12-23 14:51:25,196][00245] Num frames 2000... [2024-12-23 14:51:25,314][00245] Num frames 2100... [2024-12-23 14:51:25,432][00245] Num frames 2200... [2024-12-23 14:51:25,561][00245] Num frames 2300... [2024-12-23 14:51:25,679][00245] Num frames 2400... [2024-12-23 14:51:25,805][00245] Num frames 2500... [2024-12-23 14:51:25,924][00245] Num frames 2600... [2024-12-23 14:51:26,010][00245] Avg episode rewards: #0: 17.413, true rewards: #0: 8.747 [2024-12-23 14:51:26,014][00245] Avg episode reward: 17.413, avg true_objective: 8.747 [2024-12-23 14:51:26,106][00245] Num frames 2700... [2024-12-23 14:51:26,234][00245] Num frames 2800... [2024-12-23 14:51:26,353][00245] Num frames 2900... [2024-12-23 14:51:26,528][00245] Num frames 3000... [2024-12-23 14:51:26,709][00245] Num frames 3100... [2024-12-23 14:51:26,872][00245] Avg episode rewards: #0: 14.920, true rewards: #0: 7.920 [2024-12-23 14:51:26,875][00245] Avg episode reward: 14.920, avg true_objective: 7.920 [2024-12-23 14:51:26,927][00245] Num frames 3200... [2024-12-23 14:51:27,095][00245] Num frames 3300... [2024-12-23 14:51:27,265][00245] Num frames 3400... [2024-12-23 14:51:27,424][00245] Num frames 3500... [2024-12-23 14:51:27,598][00245] Num frames 3600... [2024-12-23 14:51:27,764][00245] Num frames 3700... [2024-12-23 14:51:27,936][00245] Num frames 3800... [2024-12-23 14:51:28,107][00245] Num frames 3900... [2024-12-23 14:51:28,297][00245] Num frames 4000... [2024-12-23 14:51:28,473][00245] Num frames 4100... [2024-12-23 14:51:28,655][00245] Num frames 4200... [2024-12-23 14:51:28,832][00245] Num frames 4300... [2024-12-23 14:51:28,960][00245] Num frames 4400... [2024-12-23 14:51:29,081][00245] Num frames 4500... [2024-12-23 14:51:29,192][00245] Avg episode rewards: #0: 18.288, true rewards: #0: 9.088 [2024-12-23 14:51:29,194][00245] Avg episode reward: 18.288, avg true_objective: 9.088 [2024-12-23 14:51:29,272][00245] Num frames 4600... [2024-12-23 14:51:29,393][00245] Num frames 4700... [2024-12-23 14:51:29,512][00245] Num frames 4800... [2024-12-23 14:51:29,643][00245] Num frames 4900... [2024-12-23 14:51:29,763][00245] Num frames 5000... [2024-12-23 14:51:29,881][00245] Num frames 5100... [2024-12-23 14:51:30,006][00245] Num frames 5200... [2024-12-23 14:51:30,127][00245] Num frames 5300... [2024-12-23 14:51:30,246][00245] Num frames 5400... [2024-12-23 14:51:30,319][00245] Avg episode rewards: #0: 18.347, true rewards: #0: 9.013 [2024-12-23 14:51:30,320][00245] Avg episode reward: 18.347, avg true_objective: 9.013 [2024-12-23 14:51:30,431][00245] Num frames 5500... [2024-12-23 14:51:30,557][00245] Num frames 5600... [2024-12-23 14:51:30,677][00245] Num frames 5700... [2024-12-23 14:51:30,800][00245] Num frames 5800... [2024-12-23 14:51:30,918][00245] Num frames 5900... [2024-12-23 14:51:31,045][00245] Num frames 6000... [2024-12-23 14:51:31,162][00245] Avg episode rewards: #0: 17.502, true rewards: #0: 8.644 [2024-12-23 14:51:31,164][00245] Avg episode reward: 17.502, avg true_objective: 8.644 [2024-12-23 14:51:31,228][00245] Num frames 6100... [2024-12-23 14:51:31,353][00245] Num frames 6200... [2024-12-23 14:51:31,472][00245] Num frames 6300... [2024-12-23 14:51:31,598][00245] Num frames 6400... [2024-12-23 14:51:31,718][00245] Num frames 6500... [2024-12-23 14:51:31,849][00245] Avg episode rewards: #0: 16.329, true rewards: #0: 8.204 [2024-12-23 14:51:31,851][00245] Avg episode reward: 16.329, avg true_objective: 8.204 [2024-12-23 14:51:31,897][00245] Num frames 6600... [2024-12-23 14:51:32,019][00245] Num frames 6700... [2024-12-23 14:51:32,142][00245] Num frames 6800... [2024-12-23 14:51:32,260][00245] Num frames 6900... [2024-12-23 14:51:32,334][00245] Avg episode rewards: #0: 15.017, true rewards: #0: 7.683 [2024-12-23 14:51:32,336][00245] Avg episode reward: 15.017, avg true_objective: 7.683 [2024-12-23 14:51:32,439][00245] Num frames 7000... [2024-12-23 14:51:32,568][00245] Num frames 7100... [2024-12-23 14:51:32,693][00245] Num frames 7200... [2024-12-23 14:51:32,815][00245] Num frames 7300... [2024-12-23 14:51:32,934][00245] Num frames 7400... [2024-12-23 14:51:33,056][00245] Num frames 7500... [2024-12-23 14:51:33,175][00245] Num frames 7600... [2024-12-23 14:51:33,297][00245] Num frames 7700... [2024-12-23 14:51:33,423][00245] Num frames 7800... [2024-12-23 14:51:33,551][00245] Num frames 7900... [2024-12-23 14:51:33,675][00245] Num frames 8000... [2024-12-23 14:51:33,816][00245] Avg episode rewards: #0: 15.967, true rewards: #0: 8.067 [2024-12-23 14:51:33,818][00245] Avg episode reward: 15.967, avg true_objective: 8.067 [2024-12-23 14:52:20,826][00245] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-23 14:52:21,190][00245] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-23 14:52:21,192][00245] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-23 14:52:21,194][00245] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-23 14:52:21,196][00245] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-23 14:52:21,198][00245] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-23 14:52:21,200][00245] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-23 14:52:21,201][00245] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-23 14:52:21,203][00245] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-23 14:52:21,204][00245] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-23 14:52:21,205][00245] Adding new argument 'hf_repository'='wirthy21/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-23 14:52:21,206][00245] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-23 14:52:21,207][00245] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-23 14:52:21,208][00245] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-23 14:52:21,209][00245] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-23 14:52:21,210][00245] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-23 14:52:21,247][00245] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 14:52:21,249][00245] RunningMeanStd input shape: (1,) [2024-12-23 14:52:21,268][00245] ConvEncoder: input_channels=3 [2024-12-23 14:52:21,327][00245] Conv encoder output size: 512 [2024-12-23 14:52:21,329][00245] Policy head output size: 512 [2024-12-23 14:52:21,354][00245] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 14:52:22,012][00245] Num frames 100... [2024-12-23 14:52:22,187][00245] Num frames 200... [2024-12-23 14:52:22,345][00245] Num frames 300... [2024-12-23 14:52:22,498][00245] Num frames 400... [2024-12-23 14:52:22,656][00245] Num frames 500... [2024-12-23 14:52:22,813][00245] Num frames 600... [2024-12-23 14:52:22,968][00245] Num frames 700... [2024-12-23 14:52:23,127][00245] Num frames 800... [2024-12-23 14:52:23,282][00245] Num frames 900... [2024-12-23 14:52:23,386][00245] Avg episode rewards: #0: 16.270, true rewards: #0: 9.270 [2024-12-23 14:52:23,388][00245] Avg episode reward: 16.270, avg true_objective: 9.270 [2024-12-23 14:52:23,516][00245] Num frames 1000... [2024-12-23 14:52:23,676][00245] Num frames 1100... [2024-12-23 14:52:23,850][00245] Num frames 1200... [2024-12-23 14:52:24,024][00245] Num frames 1300... [2024-12-23 14:52:24,205][00245] Num frames 1400... [2024-12-23 14:52:24,384][00245] Num frames 1500... [2024-12-23 14:52:24,568][00245] Num frames 1600... [2024-12-23 14:52:24,760][00245] Num frames 1700... [2024-12-23 14:52:24,926][00245] Avg episode rewards: #0: 17.295, true rewards: #0: 8.795 [2024-12-23 14:52:24,928][00245] Avg episode reward: 17.295, avg true_objective: 8.795 [2024-12-23 14:52:25,006][00245] Num frames 1800... [2024-12-23 14:52:25,191][00245] Num frames 1900... [2024-12-23 14:52:25,378][00245] Num frames 2000... [2024-12-23 14:52:25,559][00245] Num frames 2100... [2024-12-23 14:52:25,727][00245] Num frames 2200... [2024-12-23 14:52:25,884][00245] Num frames 2300... [2024-12-23 14:52:25,986][00245] Avg episode rewards: #0: 14.783, true rewards: #0: 7.783 [2024-12-23 14:52:25,989][00245] Avg episode reward: 14.783, avg true_objective: 7.783 [2024-12-23 14:52:26,067][00245] Num frames 2400... [2024-12-23 14:52:26,186][00245] Num frames 2500... [2024-12-23 14:52:26,314][00245] Num frames 2600... [2024-12-23 14:52:26,445][00245] Num frames 2700... [2024-12-23 14:52:26,575][00245] Num frames 2800... [2024-12-23 14:52:26,696][00245] Num frames 2900... [2024-12-23 14:52:26,816][00245] Num frames 3000... [2024-12-23 14:52:26,936][00245] Num frames 3100... [2024-12-23 14:52:27,056][00245] Num frames 3200... [2024-12-23 14:52:27,196][00245] Avg episode rewards: #0: 17.425, true rewards: #0: 8.175 [2024-12-23 14:52:27,197][00245] Avg episode reward: 17.425, avg true_objective: 8.175 [2024-12-23 14:52:27,237][00245] Num frames 3300... [2024-12-23 14:52:27,363][00245] Num frames 3400... [2024-12-23 14:52:27,487][00245] Num frames 3500... [2024-12-23 14:52:27,618][00245] Num frames 3600... [2024-12-23 14:52:27,736][00245] Num frames 3700... [2024-12-23 14:52:27,854][00245] Num frames 3800... [2024-12-23 14:52:27,975][00245] Num frames 3900... [2024-12-23 14:52:28,080][00245] Avg episode rewards: #0: 16.284, true rewards: #0: 7.884 [2024-12-23 14:52:28,081][00245] Avg episode reward: 16.284, avg true_objective: 7.884 [2024-12-23 14:52:28,152][00245] Num frames 4000... [2024-12-23 14:52:28,270][00245] Num frames 4100... [2024-12-23 14:52:28,397][00245] Num frames 4200... [2024-12-23 14:52:28,514][00245] Num frames 4300... [2024-12-23 14:52:28,644][00245] Num frames 4400... [2024-12-23 14:52:28,765][00245] Num frames 4500... [2024-12-23 14:52:28,883][00245] Num frames 4600... [2024-12-23 14:52:29,002][00245] Num frames 4700... [2024-12-23 14:52:29,119][00245] Num frames 4800... [2024-12-23 14:52:29,239][00245] Num frames 4900... [2024-12-23 14:52:29,370][00245] Num frames 5000... [2024-12-23 14:52:29,490][00245] Num frames 5100... [2024-12-23 14:52:29,620][00245] Num frames 5200... [2024-12-23 14:52:29,740][00245] Num frames 5300... [2024-12-23 14:52:29,859][00245] Num frames 5400... [2024-12-23 14:52:29,982][00245] Num frames 5500... [2024-12-23 14:52:30,100][00245] Num frames 5600... [2024-12-23 14:52:30,220][00245] Num frames 5700... [2024-12-23 14:52:30,340][00245] Num frames 5800... [2024-12-23 14:52:30,438][00245] Avg episode rewards: #0: 22.383, true rewards: #0: 9.717 [2024-12-23 14:52:30,440][00245] Avg episode reward: 22.383, avg true_objective: 9.717 [2024-12-23 14:52:30,526][00245] Num frames 5900... [2024-12-23 14:52:30,655][00245] Num frames 6000... [2024-12-23 14:52:30,775][00245] Num frames 6100... [2024-12-23 14:52:30,937][00245] Num frames 6200... [2024-12-23 14:52:31,104][00245] Num frames 6300... [2024-12-23 14:52:31,267][00245] Num frames 6400... [2024-12-23 14:52:31,437][00245] Num frames 6500... [2024-12-23 14:52:31,603][00245] Num frames 6600... [2024-12-23 14:52:31,817][00245] Avg episode rewards: #0: 21.706, true rewards: #0: 9.563 [2024-12-23 14:52:31,821][00245] Avg episode reward: 21.706, avg true_objective: 9.563 [2024-12-23 14:52:31,833][00245] Num frames 6700... [2024-12-23 14:52:31,992][00245] Num frames 6800... [2024-12-23 14:52:32,163][00245] Num frames 6900... [2024-12-23 14:52:32,349][00245] Num frames 7000... [2024-12-23 14:52:32,524][00245] Num frames 7100... [2024-12-23 14:52:32,703][00245] Num frames 7200... [2024-12-23 14:52:32,879][00245] Num frames 7300... [2024-12-23 14:52:33,049][00245] Num frames 7400... [2024-12-23 14:52:33,227][00245] Num frames 7500... [2024-12-23 14:52:33,367][00245] Num frames 7600... [2024-12-23 14:52:33,488][00245] Num frames 7700... [2024-12-23 14:52:33,625][00245] Num frames 7800... [2024-12-23 14:52:33,745][00245] Num frames 7900... [2024-12-23 14:52:33,864][00245] Num frames 8000... [2024-12-23 14:52:33,986][00245] Num frames 8100... [2024-12-23 14:52:34,046][00245] Avg episode rewards: #0: 23.252, true rewards: #0: 10.127 [2024-12-23 14:52:34,047][00245] Avg episode reward: 23.252, avg true_objective: 10.127 [2024-12-23 14:52:34,165][00245] Num frames 8200... [2024-12-23 14:52:34,284][00245] Num frames 8300... [2024-12-23 14:52:34,405][00245] Num frames 8400... [2024-12-23 14:52:34,531][00245] Num frames 8500... [2024-12-23 14:52:34,661][00245] Num frames 8600... [2024-12-23 14:52:34,783][00245] Num frames 8700... [2024-12-23 14:52:34,902][00245] Avg episode rewards: #0: 22.503, true rewards: #0: 9.726 [2024-12-23 14:52:34,903][00245] Avg episode reward: 22.503, avg true_objective: 9.726 [2024-12-23 14:52:34,962][00245] Num frames 8800... [2024-12-23 14:52:35,081][00245] Num frames 8900... [2024-12-23 14:52:35,199][00245] Num frames 9000... [2024-12-23 14:52:35,320][00245] Num frames 9100... [2024-12-23 14:52:35,441][00245] Num frames 9200... [2024-12-23 14:52:35,578][00245] Num frames 9300... [2024-12-23 14:52:35,700][00245] Num frames 9400... [2024-12-23 14:52:35,820][00245] Num frames 9500... [2024-12-23 14:52:35,940][00245] Num frames 9600... [2024-12-23 14:52:36,018][00245] Avg episode rewards: #0: 22.317, true rewards: #0: 9.617 [2024-12-23 14:52:36,020][00245] Avg episode reward: 22.317, avg true_objective: 9.617 [2024-12-23 14:53:31,912][00245] Replay video saved to /content/train_dir/default_experiment/replay.mp4!