[2024-10-03 23:32:19,347][01629] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-03 23:32:19,351][01629] Rollout worker 0 uses device cpu [2024-10-03 23:32:19,352][01629] Rollout worker 1 uses device cpu [2024-10-03 23:32:19,353][01629] Rollout worker 2 uses device cpu [2024-10-03 23:32:19,357][01629] Rollout worker 3 uses device cpu [2024-10-03 23:32:19,359][01629] Rollout worker 4 uses device cpu [2024-10-03 23:32:19,360][01629] Rollout worker 5 uses device cpu [2024-10-03 23:32:19,361][01629] Rollout worker 6 uses device cpu [2024-10-03 23:32:19,362][01629] Rollout worker 7 uses device cpu [2024-10-03 23:32:19,525][01629] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-03 23:32:19,526][01629] InferenceWorker_p0-w0: min num requests: 2 [2024-10-03 23:32:19,561][01629] Starting all processes... [2024-10-03 23:32:19,564][01629] Starting process learner_proc0 [2024-10-03 23:32:20,253][01629] Starting all processes... [2024-10-03 23:32:20,261][01629] Starting process inference_proc0-0 [2024-10-03 23:32:20,262][01629] Starting process rollout_proc0 [2024-10-03 23:32:20,280][01629] Starting process rollout_proc1 [2024-10-03 23:32:20,280][01629] Starting process rollout_proc2 [2024-10-03 23:32:20,282][01629] Starting process rollout_proc3 [2024-10-03 23:32:20,282][01629] Starting process rollout_proc4 [2024-10-03 23:32:20,282][01629] Starting process rollout_proc5 [2024-10-03 23:32:20,287][01629] Starting process rollout_proc6 [2024-10-03 23:32:20,287][01629] Starting process rollout_proc7 [2024-10-03 23:32:36,118][03601] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-03 23:32:36,121][03601] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-03 23:32:36,193][03601] Num visible devices: 1 [2024-10-03 23:32:36,244][03601] Starting seed is not provided [2024-10-03 23:32:36,245][03601] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-03 23:32:36,245][03601] Initializing actor-critic model on device cuda:0 [2024-10-03 23:32:36,246][03601] RunningMeanStd input shape: (3, 72, 128) [2024-10-03 23:32:36,250][03601] RunningMeanStd input shape: (1,) [2024-10-03 23:32:36,337][03601] ConvEncoder: input_channels=3 [2024-10-03 23:32:36,387][03619] Worker 4 uses CPU cores [0] [2024-10-03 23:32:36,451][03617] Worker 2 uses CPU cores [0] [2024-10-03 23:32:36,630][03616] Worker 1 uses CPU cores [1] [2024-10-03 23:32:36,631][03618] Worker 3 uses CPU cores [1] [2024-10-03 23:32:36,632][03614] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-03 23:32:36,632][03614] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-03 23:32:36,662][03615] Worker 0 uses CPU cores [0] [2024-10-03 23:32:36,713][03614] Num visible devices: 1 [2024-10-03 23:32:36,818][03622] Worker 7 uses CPU cores [1] [2024-10-03 23:32:36,846][03621] Worker 6 uses CPU cores [0] [2024-10-03 23:32:36,861][03620] Worker 5 uses CPU cores [1] [2024-10-03 23:32:36,893][03601] Conv encoder output size: 512 [2024-10-03 23:32:36,893][03601] Policy head output size: 512 [2024-10-03 23:32:36,952][03601] Created Actor Critic model with architecture: [2024-10-03 23:32:36,952][03601] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-03 23:32:37,299][03601] Using optimizer [2024-10-03 23:32:38,224][03601] No checkpoints found [2024-10-03 23:32:38,224][03601] Did not load from checkpoint, starting from scratch! [2024-10-03 23:32:38,225][03601] Initialized policy 0 weights for model version 0 [2024-10-03 23:32:38,238][03601] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-03 23:32:38,246][03601] LearnerWorker_p0 finished initialization! [2024-10-03 23:32:38,389][03614] RunningMeanStd input shape: (3, 72, 128) [2024-10-03 23:32:38,391][03614] RunningMeanStd input shape: (1,) [2024-10-03 23:32:38,412][03614] ConvEncoder: input_channels=3 [2024-10-03 23:32:38,579][03614] Conv encoder output size: 512 [2024-10-03 23:32:38,579][03614] Policy head output size: 512 [2024-10-03 23:32:38,654][01629] Inference worker 0-0 is ready! [2024-10-03 23:32:38,657][01629] All inference workers are ready! Signal rollout workers to start! [2024-10-03 23:32:38,891][03617] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:38,896][03621] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:38,895][03615] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:38,907][03619] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:39,007][03616] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:39,006][03618] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:39,010][03622] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:39,011][03620] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:32:39,341][01629] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-03 23:32:39,517][01629] Heartbeat connected on Batcher_0 [2024-10-03 23:32:39,523][01629] Heartbeat connected on LearnerWorker_p0 [2024-10-03 23:32:39,572][01629] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-03 23:32:40,566][03622] Decorrelating experience for 0 frames... [2024-10-03 23:32:40,577][03616] Decorrelating experience for 0 frames... [2024-10-03 23:32:41,170][03617] Decorrelating experience for 0 frames... [2024-10-03 23:32:41,175][03621] Decorrelating experience for 0 frames... [2024-10-03 23:32:41,174][03619] Decorrelating experience for 0 frames... [2024-10-03 23:32:41,180][03615] Decorrelating experience for 0 frames... [2024-10-03 23:32:41,933][03617] Decorrelating experience for 32 frames... [2024-10-03 23:32:41,941][03619] Decorrelating experience for 32 frames... [2024-10-03 23:32:42,212][03622] Decorrelating experience for 32 frames... [2024-10-03 23:32:42,213][03616] Decorrelating experience for 32 frames... [2024-10-03 23:32:42,684][03620] Decorrelating experience for 0 frames... [2024-10-03 23:32:42,696][03618] Decorrelating experience for 0 frames... [2024-10-03 23:32:43,320][03617] Decorrelating experience for 64 frames... [2024-10-03 23:32:43,342][03619] Decorrelating experience for 64 frames... [2024-10-03 23:32:43,376][03615] Decorrelating experience for 32 frames... [2024-10-03 23:32:43,906][03622] Decorrelating experience for 64 frames... [2024-10-03 23:32:43,969][03618] Decorrelating experience for 32 frames... [2024-10-03 23:32:43,976][03620] Decorrelating experience for 32 frames... [2024-10-03 23:32:44,340][01629] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-03 23:32:44,636][03617] Decorrelating experience for 96 frames... [2024-10-03 23:32:44,860][03616] Decorrelating experience for 64 frames... [2024-10-03 23:32:44,860][01629] Heartbeat connected on RolloutWorker_w2 [2024-10-03 23:32:45,160][03615] Decorrelating experience for 64 frames... [2024-10-03 23:32:45,610][03619] Decorrelating experience for 96 frames... [2024-10-03 23:32:45,653][03621] Decorrelating experience for 32 frames... [2024-10-03 23:32:45,667][03622] Decorrelating experience for 96 frames... [2024-10-03 23:32:45,853][01629] Heartbeat connected on RolloutWorker_w7 [2024-10-03 23:32:46,043][01629] Heartbeat connected on RolloutWorker_w4 [2024-10-03 23:32:46,127][03618] Decorrelating experience for 64 frames... [2024-10-03 23:32:47,055][03621] Decorrelating experience for 64 frames... [2024-10-03 23:32:47,207][03620] Decorrelating experience for 64 frames... [2024-10-03 23:32:47,513][03616] Decorrelating experience for 96 frames... [2024-10-03 23:32:47,706][01629] Heartbeat connected on RolloutWorker_w1 [2024-10-03 23:32:48,325][03620] Decorrelating experience for 96 frames... [2024-10-03 23:32:48,547][01629] Heartbeat connected on RolloutWorker_w5 [2024-10-03 23:32:48,642][03621] Decorrelating experience for 96 frames... [2024-10-03 23:32:48,923][01629] Heartbeat connected on RolloutWorker_w6 [2024-10-03 23:32:49,340][01629] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 20.2. Samples: 202. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-03 23:32:49,343][01629] Avg episode reward: [(0, '1.696')] [2024-10-03 23:32:50,548][03615] Decorrelating experience for 96 frames... [2024-10-03 23:32:50,779][03601] Signal inference workers to stop experience collection... [2024-10-03 23:32:50,795][03614] InferenceWorker_p0-w0: stopping experience collection [2024-10-03 23:32:50,861][01629] Heartbeat connected on RolloutWorker_w0 [2024-10-03 23:32:51,073][03618] Decorrelating experience for 96 frames... [2024-10-03 23:32:51,150][01629] Heartbeat connected on RolloutWorker_w3 [2024-10-03 23:32:54,340][01629] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 154.5. Samples: 2318. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-03 23:32:54,345][01629] Avg episode reward: [(0, '2.428')] [2024-10-03 23:32:54,763][03601] Signal inference workers to resume experience collection... [2024-10-03 23:32:54,764][03614] InferenceWorker_p0-w0: resuming experience collection [2024-10-03 23:32:59,340][01629] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 154.9. Samples: 3098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:32:59,342][01629] Avg episode reward: [(0, '3.191')] [2024-10-03 23:33:03,869][03614] Updated weights for policy 0, policy_version 10 (0.0034) [2024-10-03 23:33:04,340][01629] Fps is (10 sec: 4096.0, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 383.8. Samples: 9596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:33:04,343][01629] Avg episode reward: [(0, '4.013')] [2024-10-03 23:33:09,340][01629] Fps is (10 sec: 3276.8, 60 sec: 1775.0, 300 sec: 1775.0). Total num frames: 53248. Throughput: 0: 472.0. Samples: 14160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:33:09,348][01629] Avg episode reward: [(0, '4.293')] [2024-10-03 23:33:14,340][01629] Fps is (10 sec: 2048.0, 60 sec: 1755.5, 300 sec: 1755.5). Total num frames: 61440. Throughput: 0: 429.7. Samples: 15040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:33:14,346][01629] Avg episode reward: [(0, '4.295')] [2024-10-03 23:33:18,212][03614] Updated weights for policy 0, policy_version 20 (0.0064) [2024-10-03 23:33:19,340][01629] Fps is (10 sec: 3276.8, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 515.6. Samples: 20622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-03 23:33:19,342][01629] Avg episode reward: [(0, '4.160')] [2024-10-03 23:33:24,341][01629] Fps is (10 sec: 4505.4, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 611.4. Samples: 27512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:33:24,344][01629] Avg episode reward: [(0, '4.325')] [2024-10-03 23:33:24,355][03601] Saving new best policy, reward=4.325! [2024-10-03 23:33:29,094][03614] Updated weights for policy 0, policy_version 30 (0.0040) [2024-10-03 23:33:29,340][01629] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 662.8. Samples: 29826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:33:29,345][01629] Avg episode reward: [(0, '4.316')] [2024-10-03 23:33:34,340][01629] Fps is (10 sec: 3277.0, 60 sec: 2532.1, 300 sec: 2532.1). Total num frames: 139264. Throughput: 0: 758.6. Samples: 34338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:33:34,351][01629] Avg episode reward: [(0, '4.300')] [2024-10-03 23:33:39,340][01629] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2525.9). Total num frames: 151552. Throughput: 0: 803.2. Samples: 38464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:33:39,344][01629] Avg episode reward: [(0, '4.198')] [2024-10-03 23:33:41,719][03614] Updated weights for policy 0, policy_version 40 (0.0035) [2024-10-03 23:33:44,341][01629] Fps is (10 sec: 2867.0, 60 sec: 2798.9, 300 sec: 2583.6). Total num frames: 167936. Throughput: 0: 854.9. Samples: 41568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:33:44,345][01629] Avg episode reward: [(0, '4.361')] [2024-10-03 23:33:44,359][03601] Saving new best policy, reward=4.361! [2024-10-03 23:33:49,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2633.2). Total num frames: 184320. Throughput: 0: 803.2. Samples: 45740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:33:49,347][01629] Avg episode reward: [(0, '4.333')] [2024-10-03 23:33:53,283][03614] Updated weights for policy 0, policy_version 50 (0.0017) [2024-10-03 23:33:54,340][01629] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 2785.3). Total num frames: 208896. Throughput: 0: 845.4. Samples: 52202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:33:54,347][01629] Avg episode reward: [(0, '4.194')] [2024-10-03 23:33:59,344][01629] Fps is (10 sec: 4503.9, 60 sec: 3481.4, 300 sec: 2867.1). Total num frames: 229376. Throughput: 0: 904.7. Samples: 55756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:33:59,347][01629] Avg episode reward: [(0, '4.341')] [2024-10-03 23:34:04,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2843.1). Total num frames: 241664. Throughput: 0: 890.5. Samples: 60696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:34:04,345][01629] Avg episode reward: [(0, '4.417')] [2024-10-03 23:34:04,398][03601] Saving new best policy, reward=4.417! [2024-10-03 23:34:04,412][03614] Updated weights for policy 0, policy_version 60 (0.0025) [2024-10-03 23:34:09,340][01629] Fps is (10 sec: 3687.8, 60 sec: 3549.9, 300 sec: 2958.2). Total num frames: 266240. Throughput: 0: 864.8. Samples: 66428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:34:09,342][01629] Avg episode reward: [(0, '4.519')] [2024-10-03 23:34:09,347][03601] Saving new best policy, reward=4.519! [2024-10-03 23:34:13,586][03614] Updated weights for policy 0, policy_version 70 (0.0029) [2024-10-03 23:34:14,342][01629] Fps is (10 sec: 4504.8, 60 sec: 3754.6, 300 sec: 3018.1). Total num frames: 286720. Throughput: 0: 889.6. Samples: 69858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:34:14,344][01629] Avg episode reward: [(0, '4.424')] [2024-10-03 23:34:14,355][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000070_286720.pth... [2024-10-03 23:34:19,346][01629] Fps is (10 sec: 3684.3, 60 sec: 3617.8, 300 sec: 3030.9). Total num frames: 303104. Throughput: 0: 920.1. Samples: 75746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:34:19,349][01629] Avg episode reward: [(0, '4.272')] [2024-10-03 23:34:24,340][01629] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3042.8). Total num frames: 319488. Throughput: 0: 932.0. Samples: 80404. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-03 23:34:24,343][01629] Avg episode reward: [(0, '4.259')] [2024-10-03 23:34:25,435][03614] Updated weights for policy 0, policy_version 80 (0.0042) [2024-10-03 23:34:29,340][01629] Fps is (10 sec: 4098.4, 60 sec: 3686.4, 300 sec: 3127.9). Total num frames: 344064. Throughput: 0: 941.3. Samples: 83926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:34:29,343][01629] Avg episode reward: [(0, '4.364')] [2024-10-03 23:34:34,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3134.3). Total num frames: 360448. Throughput: 0: 999.4. Samples: 90712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:34:34,344][01629] Avg episode reward: [(0, '4.520')] [2024-10-03 23:34:36,259][03614] Updated weights for policy 0, policy_version 90 (0.0020) [2024-10-03 23:34:39,342][01629] Fps is (10 sec: 2866.7, 60 sec: 3686.3, 300 sec: 3106.1). Total num frames: 372736. Throughput: 0: 932.1. Samples: 94146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:34:39,345][01629] Avg episode reward: [(0, '4.528')] [2024-10-03 23:34:39,348][03601] Saving new best policy, reward=4.528! [2024-10-03 23:34:44,340][01629] Fps is (10 sec: 2457.6, 60 sec: 3618.2, 300 sec: 3080.2). Total num frames: 385024. Throughput: 0: 890.4. Samples: 95822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:34:44,343][01629] Avg episode reward: [(0, '4.546')] [2024-10-03 23:34:44,429][03601] Saving new best policy, reward=4.546! [2024-10-03 23:34:48,850][03614] Updated weights for policy 0, policy_version 100 (0.0019) [2024-10-03 23:34:49,340][01629] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3150.8). Total num frames: 409600. Throughput: 0: 910.8. Samples: 101682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:34:49,346][01629] Avg episode reward: [(0, '4.472')] [2024-10-03 23:34:54,346][01629] Fps is (10 sec: 4503.1, 60 sec: 3686.1, 300 sec: 3185.7). Total num frames: 430080. Throughput: 0: 921.1. Samples: 107882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:34:54,348][01629] Avg episode reward: [(0, '4.445')] [2024-10-03 23:34:59,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3159.8). Total num frames: 442368. Throughput: 0: 889.7. Samples: 109894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:34:59,344][01629] Avg episode reward: [(0, '4.312')] [2024-10-03 23:35:01,183][03614] Updated weights for policy 0, policy_version 110 (0.0017) [2024-10-03 23:35:04,340][01629] Fps is (10 sec: 3278.6, 60 sec: 3686.4, 300 sec: 3192.1). Total num frames: 462848. Throughput: 0: 873.5. Samples: 115048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:35:04,345][01629] Avg episode reward: [(0, '4.327')] [2024-10-03 23:35:09,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3249.5). Total num frames: 487424. Throughput: 0: 924.9. Samples: 122024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:35:09,342][01629] Avg episode reward: [(0, '4.530')] [2024-10-03 23:35:10,031][03614] Updated weights for policy 0, policy_version 120 (0.0042) [2024-10-03 23:35:14,344][01629] Fps is (10 sec: 4094.5, 60 sec: 3618.0, 300 sec: 3250.3). Total num frames: 503808. Throughput: 0: 913.1. Samples: 125020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:35:14,346][01629] Avg episode reward: [(0, '4.400')] [2024-10-03 23:35:19,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.5, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 857.0. Samples: 129278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:35:19,342][01629] Avg episode reward: [(0, '4.432')] [2024-10-03 23:35:21,487][03614] Updated weights for policy 0, policy_version 130 (0.0020) [2024-10-03 23:35:24,341][01629] Fps is (10 sec: 4097.4, 60 sec: 3754.6, 300 sec: 3301.6). Total num frames: 544768. Throughput: 0: 937.3. Samples: 136322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:35:24,347][01629] Avg episode reward: [(0, '4.571')] [2024-10-03 23:35:24,356][03601] Saving new best policy, reward=4.571! [2024-10-03 23:35:29,343][01629] Fps is (10 sec: 4504.4, 60 sec: 3686.2, 300 sec: 3324.9). Total num frames: 565248. Throughput: 0: 977.2. Samples: 139800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:35:29,345][01629] Avg episode reward: [(0, '4.352')] [2024-10-03 23:35:31,893][03614] Updated weights for policy 0, policy_version 140 (0.0019) [2024-10-03 23:35:34,340][01629] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3300.2). Total num frames: 577536. Throughput: 0: 949.2. Samples: 144396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:35:34,348][01629] Avg episode reward: [(0, '4.372')] [2024-10-03 23:35:39,340][01629] Fps is (10 sec: 3687.4, 60 sec: 3823.0, 300 sec: 3345.1). Total num frames: 602112. Throughput: 0: 946.9. Samples: 150486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:35:39,346][01629] Avg episode reward: [(0, '4.463')] [2024-10-03 23:35:41,851][03614] Updated weights for policy 0, policy_version 150 (0.0016) [2024-10-03 23:35:44,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3365.4). Total num frames: 622592. Throughput: 0: 980.6. Samples: 154020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:35:44,351][01629] Avg episode reward: [(0, '4.331')] [2024-10-03 23:35:49,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3363.0). Total num frames: 638976. Throughput: 0: 991.3. Samples: 159656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:35:49,348][01629] Avg episode reward: [(0, '4.458')] [2024-10-03 23:35:53,306][03614] Updated weights for policy 0, policy_version 160 (0.0020) [2024-10-03 23:35:54,340][01629] Fps is (10 sec: 3686.5, 60 sec: 3823.3, 300 sec: 3381.8). Total num frames: 659456. Throughput: 0: 952.6. Samples: 164892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:35:54,346][01629] Avg episode reward: [(0, '4.527')] [2024-10-03 23:35:59,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3399.7). Total num frames: 679936. Throughput: 0: 966.0. Samples: 168488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:35:59,346][01629] Avg episode reward: [(0, '4.325')] [2024-10-03 23:36:02,543][03614] Updated weights for policy 0, policy_version 170 (0.0028) [2024-10-03 23:36:04,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3416.7). Total num frames: 700416. Throughput: 0: 1012.1. Samples: 174822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:36:04,344][01629] Avg episode reward: [(0, '4.430')] [2024-10-03 23:36:09,340][01629] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3413.3). Total num frames: 716800. Throughput: 0: 954.7. Samples: 179282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:09,348][01629] Avg episode reward: [(0, '4.611')] [2024-10-03 23:36:09,351][03601] Saving new best policy, reward=4.611! [2024-10-03 23:36:13,654][03614] Updated weights for policy 0, policy_version 180 (0.0037) [2024-10-03 23:36:14,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3429.2). Total num frames: 737280. Throughput: 0: 953.6. Samples: 182710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:14,344][01629] Avg episode reward: [(0, '4.739')] [2024-10-03 23:36:14,354][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth... [2024-10-03 23:36:14,495][03601] Saving new best policy, reward=4.739! [2024-10-03 23:36:19,340][01629] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3444.4). Total num frames: 757760. Throughput: 0: 1001.6. Samples: 189468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:36:19,345][01629] Avg episode reward: [(0, '4.627')] [2024-10-03 23:36:24,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3422.4). Total num frames: 770048. Throughput: 0: 954.2. Samples: 193426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:24,343][01629] Avg episode reward: [(0, '4.561')] [2024-10-03 23:36:26,771][03614] Updated weights for policy 0, policy_version 190 (0.0028) [2024-10-03 23:36:29,343][01629] Fps is (10 sec: 2456.9, 60 sec: 3618.1, 300 sec: 3401.4). Total num frames: 782336. Throughput: 0: 914.5. Samples: 195174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:29,345][01629] Avg episode reward: [(0, '4.642')] [2024-10-03 23:36:34,340][01629] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3416.2). Total num frames: 802816. Throughput: 0: 906.3. Samples: 200442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:34,343][01629] Avg episode reward: [(0, '4.778')] [2024-10-03 23:36:34,351][03601] Saving new best policy, reward=4.778! [2024-10-03 23:36:36,935][03614] Updated weights for policy 0, policy_version 200 (0.0023) [2024-10-03 23:36:39,340][01629] Fps is (10 sec: 4506.8, 60 sec: 3754.7, 300 sec: 3447.5). Total num frames: 827392. Throughput: 0: 940.3. Samples: 207206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:36:39,351][01629] Avg episode reward: [(0, '4.781')] [2024-10-03 23:36:39,353][03601] Saving new best policy, reward=4.781! [2024-10-03 23:36:44,342][01629] Fps is (10 sec: 3685.9, 60 sec: 3618.0, 300 sec: 3427.3). Total num frames: 839680. Throughput: 0: 903.9. Samples: 209166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:44,344][01629] Avg episode reward: [(0, '4.584')] [2024-10-03 23:36:48,485][03614] Updated weights for policy 0, policy_version 210 (0.0017) [2024-10-03 23:36:49,343][01629] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3440.6). Total num frames: 860160. Throughput: 0: 885.8. Samples: 214684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:49,345][01629] Avg episode reward: [(0, '4.459')] [2024-10-03 23:36:54,340][01629] Fps is (10 sec: 4506.3, 60 sec: 3754.7, 300 sec: 3469.6). Total num frames: 884736. Throughput: 0: 943.5. Samples: 221740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:54,346][01629] Avg episode reward: [(0, '4.444')] [2024-10-03 23:36:58,659][03614] Updated weights for policy 0, policy_version 220 (0.0030) [2024-10-03 23:36:59,341][01629] Fps is (10 sec: 4096.9, 60 sec: 3686.4, 300 sec: 3465.8). Total num frames: 901120. Throughput: 0: 930.4. Samples: 224578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:36:59,345][01629] Avg episode reward: [(0, '4.428')] [2024-10-03 23:37:04,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3462.3). Total num frames: 917504. Throughput: 0: 878.8. Samples: 229014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:37:04,343][01629] Avg episode reward: [(0, '4.536')] [2024-10-03 23:37:08,639][03614] Updated weights for policy 0, policy_version 230 (0.0026) [2024-10-03 23:37:09,340][01629] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3489.2). Total num frames: 942080. Throughput: 0: 947.6. Samples: 236070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:37:09,343][01629] Avg episode reward: [(0, '4.623')] [2024-10-03 23:37:14,341][01629] Fps is (10 sec: 4505.1, 60 sec: 3754.6, 300 sec: 3500.2). Total num frames: 962560. Throughput: 0: 987.7. Samples: 239620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:37:14,344][01629] Avg episode reward: [(0, '4.825')] [2024-10-03 23:37:14,358][03601] Saving new best policy, reward=4.825! [2024-10-03 23:37:19,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3496.2). Total num frames: 978944. Throughput: 0: 971.7. Samples: 244170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:37:19,347][01629] Avg episode reward: [(0, '4.923')] [2024-10-03 23:37:19,349][03601] Saving new best policy, reward=4.923! [2024-10-03 23:37:20,053][03614] Updated weights for policy 0, policy_version 240 (0.0038) [2024-10-03 23:37:24,340][01629] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3506.8). Total num frames: 999424. Throughput: 0: 962.6. Samples: 250524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:37:24,344][01629] Avg episode reward: [(0, '4.853')] [2024-10-03 23:37:28,801][03614] Updated weights for policy 0, policy_version 250 (0.0023) [2024-10-03 23:37:29,340][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3531.0). Total num frames: 1024000. Throughput: 0: 997.7. Samples: 254060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:37:29,346][01629] Avg episode reward: [(0, '4.847')] [2024-10-03 23:37:34,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3526.7). Total num frames: 1040384. Throughput: 0: 998.4. Samples: 259608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:37:34,346][01629] Avg episode reward: [(0, '4.857')] [2024-10-03 23:37:39,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 962.8. Samples: 265066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:37:39,342][01629] Avg episode reward: [(0, '4.918')] [2024-10-03 23:37:40,233][03614] Updated weights for policy 0, policy_version 260 (0.0046) [2024-10-03 23:37:44,340][01629] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 978.1. Samples: 268594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:37:44,348][01629] Avg episode reward: [(0, '4.730')] [2024-10-03 23:37:49,340][01629] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 1021.8. Samples: 274994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:37:49,346][01629] Avg episode reward: [(0, '4.832')] [2024-10-03 23:37:50,471][03614] Updated weights for policy 0, policy_version 270 (0.0034) [2024-10-03 23:37:54,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1118208. Throughput: 0: 961.4. Samples: 279334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:37:54,342][01629] Avg episode reward: [(0, '4.992')] [2024-10-03 23:37:54,359][03601] Saving new best policy, reward=4.992! [2024-10-03 23:37:59,341][01629] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1134592. Throughput: 0: 957.9. Samples: 282726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:37:59,354][01629] Avg episode reward: [(0, '5.087')] [2024-10-03 23:37:59,360][03601] Saving new best policy, reward=5.087! [2024-10-03 23:38:02,936][03614] Updated weights for policy 0, policy_version 280 (0.0021) [2024-10-03 23:38:04,340][01629] Fps is (10 sec: 2867.1, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1146880. Throughput: 0: 952.8. Samples: 287046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:38:04,343][01629] Avg episode reward: [(0, '5.053')] [2024-10-03 23:38:09,354][01629] Fps is (10 sec: 2863.5, 60 sec: 3685.6, 300 sec: 3734.8). Total num frames: 1163264. Throughput: 0: 902.3. Samples: 291138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:38:09,356][01629] Avg episode reward: [(0, '5.369')] [2024-10-03 23:38:09,359][03601] Saving new best policy, reward=5.369! [2024-10-03 23:38:14,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3721.1). Total num frames: 1183744. Throughput: 0: 880.9. Samples: 293700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:38:14,342][01629] Avg episode reward: [(0, '5.434')] [2024-10-03 23:38:14,355][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000289_1183744.pth... [2024-10-03 23:38:14,490][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000070_286720.pth [2024-10-03 23:38:14,508][03601] Saving new best policy, reward=5.434! [2024-10-03 23:38:15,218][03614] Updated weights for policy 0, policy_version 290 (0.0028) [2024-10-03 23:38:19,340][01629] Fps is (10 sec: 4101.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1204224. Throughput: 0: 910.1. Samples: 300562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:38:19,347][01629] Avg episode reward: [(0, '5.569')] [2024-10-03 23:38:19,350][03601] Saving new best policy, reward=5.569! [2024-10-03 23:38:24,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1220608. Throughput: 0: 912.4. Samples: 306124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:38:24,346][01629] Avg episode reward: [(0, '5.785')] [2024-10-03 23:38:24,372][03601] Saving new best policy, reward=5.785! [2024-10-03 23:38:26,284][03614] Updated weights for policy 0, policy_version 300 (0.0017) [2024-10-03 23:38:29,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 1236992. Throughput: 0: 879.8. Samples: 308184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:38:29,347][01629] Avg episode reward: [(0, '5.666')] [2024-10-03 23:38:34,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1261568. Throughput: 0: 882.0. Samples: 314682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:38:34,343][01629] Avg episode reward: [(0, '5.664')] [2024-10-03 23:38:35,747][03614] Updated weights for policy 0, policy_version 310 (0.0013) [2024-10-03 23:38:39,340][01629] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1282048. Throughput: 0: 934.8. Samples: 321402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-03 23:38:39,343][01629] Avg episode reward: [(0, '5.759')] [2024-10-03 23:38:44,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 1298432. Throughput: 0: 907.2. Samples: 323550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:38:44,342][01629] Avg episode reward: [(0, '5.702')] [2024-10-03 23:38:46,939][03614] Updated weights for policy 0, policy_version 320 (0.0041) [2024-10-03 23:38:49,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1318912. Throughput: 0: 937.1. Samples: 329214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:38:49,342][01629] Avg episode reward: [(0, '5.657')] [2024-10-03 23:38:54,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1343488. Throughput: 0: 1004.1. Samples: 336310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:38:54,345][01629] Avg episode reward: [(0, '5.948')] [2024-10-03 23:38:54,355][03601] Saving new best policy, reward=5.948! [2024-10-03 23:38:55,908][03614] Updated weights for policy 0, policy_version 330 (0.0013) [2024-10-03 23:38:59,342][01629] Fps is (10 sec: 4095.2, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 1359872. Throughput: 0: 1006.2. Samples: 338982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:38:59,348][01629] Avg episode reward: [(0, '6.307')] [2024-10-03 23:38:59,350][03601] Saving new best policy, reward=6.307! [2024-10-03 23:39:04,340][01629] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 1380352. Throughput: 0: 955.9. Samples: 343580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:39:04,344][01629] Avg episode reward: [(0, '7.143')] [2024-10-03 23:39:04,353][03601] Saving new best policy, reward=7.143! [2024-10-03 23:39:07,014][03614] Updated weights for policy 0, policy_version 340 (0.0029) [2024-10-03 23:39:09,340][01629] Fps is (10 sec: 4096.7, 60 sec: 3960.3, 300 sec: 3776.7). Total num frames: 1400832. Throughput: 0: 988.3. Samples: 350596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:39:09,348][01629] Avg episode reward: [(0, '6.994')] [2024-10-03 23:39:14,340][01629] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1413120. Throughput: 0: 993.8. Samples: 352904. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:39:14,347][01629] Avg episode reward: [(0, '7.258')] [2024-10-03 23:39:14,359][03601] Saving new best policy, reward=7.258! [2024-10-03 23:39:19,340][01629] Fps is (10 sec: 2457.7, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1425408. Throughput: 0: 924.0. Samples: 356260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:39:19,345][01629] Avg episode reward: [(0, '7.312')] [2024-10-03 23:39:19,347][03601] Saving new best policy, reward=7.312! [2024-10-03 23:39:21,335][03614] Updated weights for policy 0, policy_version 350 (0.0035) [2024-10-03 23:39:24,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1445888. Throughput: 0: 895.9. Samples: 361718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:39:24,348][01629] Avg episode reward: [(0, '6.963')] [2024-10-03 23:39:29,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1470464. Throughput: 0: 927.5. Samples: 365288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:39:29,347][01629] Avg episode reward: [(0, '6.989')] [2024-10-03 23:39:30,046][03614] Updated weights for policy 0, policy_version 360 (0.0021) [2024-10-03 23:39:34,342][01629] Fps is (10 sec: 4095.3, 60 sec: 3754.6, 300 sec: 3776.7). Total num frames: 1486848. Throughput: 0: 940.2. Samples: 371524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:39:34,345][01629] Avg episode reward: [(0, '6.926')] [2024-10-03 23:39:39,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1503232. Throughput: 0: 883.1. Samples: 376050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:39:39,345][01629] Avg episode reward: [(0, '7.479')] [2024-10-03 23:39:39,347][03601] Saving new best policy, reward=7.479! [2024-10-03 23:39:41,773][03614] Updated weights for policy 0, policy_version 370 (0.0023) [2024-10-03 23:39:44,340][01629] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1527808. Throughput: 0: 899.6. Samples: 379464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:39:44,345][01629] Avg episode reward: [(0, '7.684')] [2024-10-03 23:39:44,352][03601] Saving new best policy, reward=7.684! [2024-10-03 23:39:49,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 1548288. Throughput: 0: 953.0. Samples: 386466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:39:49,345][01629] Avg episode reward: [(0, '7.517')] [2024-10-03 23:39:52,064][03614] Updated weights for policy 0, policy_version 380 (0.0047) [2024-10-03 23:39:54,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 1560576. Throughput: 0: 893.9. Samples: 390820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:39:54,344][01629] Avg episode reward: [(0, '7.487')] [2024-10-03 23:39:59,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 1585152. Throughput: 0: 908.4. Samples: 393782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:39:59,346][01629] Avg episode reward: [(0, '7.727')] [2024-10-03 23:39:59,348][03601] Saving new best policy, reward=7.727! [2024-10-03 23:40:01,872][03614] Updated weights for policy 0, policy_version 390 (0.0039) [2024-10-03 23:40:04,341][01629] Fps is (10 sec: 4505.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1605632. Throughput: 0: 991.3. Samples: 400868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:40:04,349][01629] Avg episode reward: [(0, '8.406')] [2024-10-03 23:40:04,360][03601] Saving new best policy, reward=8.406! [2024-10-03 23:40:09,341][01629] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1622016. Throughput: 0: 983.0. Samples: 405952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:40:09,343][01629] Avg episode reward: [(0, '8.747')] [2024-10-03 23:40:09,349][03601] Saving new best policy, reward=8.747! [2024-10-03 23:40:13,696][03614] Updated weights for policy 0, policy_version 400 (0.0024) [2024-10-03 23:40:14,340][01629] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1638400. Throughput: 0: 951.1. Samples: 408088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:40:14,344][01629] Avg episode reward: [(0, '9.152')] [2024-10-03 23:40:14,353][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth... [2024-10-03 23:40:14,478][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth [2024-10-03 23:40:14,504][03601] Saving new best policy, reward=9.152! [2024-10-03 23:40:19,340][01629] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1662976. Throughput: 0: 962.7. Samples: 414842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:40:19,343][01629] Avg episode reward: [(0, '9.521')] [2024-10-03 23:40:19,347][03601] Saving new best policy, reward=9.521! [2024-10-03 23:40:22,792][03614] Updated weights for policy 0, policy_version 410 (0.0019) [2024-10-03 23:40:24,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.6). Total num frames: 1683456. Throughput: 0: 999.2. Samples: 421014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:40:24,343][01629] Avg episode reward: [(0, '9.095')] [2024-10-03 23:40:29,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1695744. Throughput: 0: 968.8. Samples: 423058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:40:29,342][01629] Avg episode reward: [(0, '9.054')] [2024-10-03 23:40:34,158][03614] Updated weights for policy 0, policy_version 420 (0.0024) [2024-10-03 23:40:34,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3790.5). Total num frames: 1720320. Throughput: 0: 949.2. Samples: 429182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:40:34,343][01629] Avg episode reward: [(0, '8.503')] [2024-10-03 23:40:39,342][01629] Fps is (10 sec: 4504.9, 60 sec: 3959.4, 300 sec: 3790.5). Total num frames: 1740800. Throughput: 0: 1003.7. Samples: 435988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:40:39,350][01629] Avg episode reward: [(0, '9.118')] [2024-10-03 23:40:44,345][01629] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3790.5). Total num frames: 1757184. Throughput: 0: 989.9. Samples: 438334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:40:44,357][01629] Avg episode reward: [(0, '8.837')] [2024-10-03 23:40:45,079][03614] Updated weights for policy 0, policy_version 430 (0.0025) [2024-10-03 23:40:49,340][01629] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1777664. Throughput: 0: 944.5. Samples: 443370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:40:49,342][01629] Avg episode reward: [(0, '9.720')] [2024-10-03 23:40:49,349][03601] Saving new best policy, reward=9.720! [2024-10-03 23:40:54,329][03614] Updated weights for policy 0, policy_version 440 (0.0026) [2024-10-03 23:40:54,340][01629] Fps is (10 sec: 4507.7, 60 sec: 4027.7, 300 sec: 3804.4). Total num frames: 1802240. Throughput: 0: 988.5. Samples: 450434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:40:54,344][01629] Avg episode reward: [(0, '9.490')] [2024-10-03 23:40:59,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1818624. Throughput: 0: 1014.2. Samples: 453728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:40:59,343][01629] Avg episode reward: [(0, '9.454')] [2024-10-03 23:41:04,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 1835008. Throughput: 0: 958.7. Samples: 457984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:41:04,346][01629] Avg episode reward: [(0, '9.307')] [2024-10-03 23:41:05,887][03614] Updated weights for policy 0, policy_version 450 (0.0028) [2024-10-03 23:41:09,340][01629] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1859584. Throughput: 0: 972.4. Samples: 464772. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:41:09,343][01629] Avg episode reward: [(0, '9.265')] [2024-10-03 23:41:14,340][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3804.4). Total num frames: 1880064. Throughput: 0: 1005.9. Samples: 468324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:41:14,342][01629] Avg episode reward: [(0, '9.834')] [2024-10-03 23:41:14,359][03601] Saving new best policy, reward=9.834! [2024-10-03 23:41:15,194][03614] Updated weights for policy 0, policy_version 460 (0.0024) [2024-10-03 23:41:19,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1892352. Throughput: 0: 980.7. Samples: 473312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:41:19,345][01629] Avg episode reward: [(0, '10.874')] [2024-10-03 23:41:19,354][03601] Saving new best policy, reward=10.874! [2024-10-03 23:41:24,340][01629] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1908736. Throughput: 0: 934.2. Samples: 478024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:41:24,345][01629] Avg episode reward: [(0, '11.719')] [2024-10-03 23:41:24,355][03601] Saving new best policy, reward=11.719! [2024-10-03 23:41:28,986][03614] Updated weights for policy 0, policy_version 470 (0.0022) [2024-10-03 23:41:29,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1925120. Throughput: 0: 927.8. Samples: 480080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:41:29,344][01629] Avg episode reward: [(0, '11.988')] [2024-10-03 23:41:29,350][03601] Saving new best policy, reward=11.988! [2024-10-03 23:41:34,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 1941504. Throughput: 0: 930.4. Samples: 485240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:41:34,343][01629] Avg episode reward: [(0, '12.083')] [2024-10-03 23:41:34,353][03601] Saving new best policy, reward=12.083! [2024-10-03 23:41:39,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3790.6). Total num frames: 1957888. Throughput: 0: 879.5. Samples: 490010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:41:39,343][01629] Avg episode reward: [(0, '11.956')] [2024-10-03 23:41:40,333][03614] Updated weights for policy 0, policy_version 480 (0.0020) [2024-10-03 23:41:44,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3755.0, 300 sec: 3804.5). Total num frames: 1982464. Throughput: 0: 884.1. Samples: 493512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:41:44,345][01629] Avg episode reward: [(0, '11.624')] [2024-10-03 23:41:49,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2002944. Throughput: 0: 945.0. Samples: 500508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:41:49,344][01629] Avg episode reward: [(0, '12.494')] [2024-10-03 23:41:49,349][03601] Saving new best policy, reward=12.494! [2024-10-03 23:41:50,405][03614] Updated weights for policy 0, policy_version 490 (0.0025) [2024-10-03 23:41:54,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 2015232. Throughput: 0: 887.9. Samples: 504726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:41:54,346][01629] Avg episode reward: [(0, '13.864')] [2024-10-03 23:41:54,359][03601] Saving new best policy, reward=13.864! [2024-10-03 23:41:59,346][01629] Fps is (10 sec: 3275.0, 60 sec: 3617.8, 300 sec: 3790.5). Total num frames: 2035712. Throughput: 0: 876.2. Samples: 507756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:41:59,350][01629] Avg episode reward: [(0, '14.393')] [2024-10-03 23:41:59,381][03601] Saving new best policy, reward=14.393! [2024-10-03 23:42:02,351][03614] Updated weights for policy 0, policy_version 500 (0.0025) [2024-10-03 23:42:04,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2052096. Throughput: 0: 875.1. Samples: 512690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:42:04,343][01629] Avg episode reward: [(0, '16.542')] [2024-10-03 23:42:04,355][03601] Saving new best policy, reward=16.542! [2024-10-03 23:42:09,340][01629] Fps is (10 sec: 2868.8, 60 sec: 3413.3, 300 sec: 3735.0). Total num frames: 2064384. Throughput: 0: 856.8. Samples: 516582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:42:09,348][01629] Avg episode reward: [(0, '16.852')] [2024-10-03 23:42:09,358][03601] Saving new best policy, reward=16.852! [2024-10-03 23:42:14,341][01629] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3735.0). Total num frames: 2080768. Throughput: 0: 857.7. Samples: 518678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:42:14,343][01629] Avg episode reward: [(0, '16.510')] [2024-10-03 23:42:14,353][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000508_2080768.pth... [2024-10-03 23:42:14,489][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000289_1183744.pth [2024-10-03 23:42:15,444][03614] Updated weights for policy 0, policy_version 510 (0.0030) [2024-10-03 23:42:19,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 2105344. Throughput: 0: 895.2. Samples: 525524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:42:19,342][01629] Avg episode reward: [(0, '15.853')] [2024-10-03 23:42:24,344][01629] Fps is (10 sec: 4504.1, 60 sec: 3617.9, 300 sec: 3734.9). Total num frames: 2125824. Throughput: 0: 928.5. Samples: 531796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:42:24,349][01629] Avg episode reward: [(0, '15.177')] [2024-10-03 23:42:25,315][03614] Updated weights for policy 0, policy_version 520 (0.0049) [2024-10-03 23:42:29,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2138112. Throughput: 0: 898.3. Samples: 533934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:42:29,343][01629] Avg episode reward: [(0, '14.747')] [2024-10-03 23:42:34,340][01629] Fps is (10 sec: 3687.7, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2162688. Throughput: 0: 877.1. Samples: 539976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:42:34,343][01629] Avg episode reward: [(0, '14.718')] [2024-10-03 23:42:35,751][03614] Updated weights for policy 0, policy_version 530 (0.0022) [2024-10-03 23:42:39,341][01629] Fps is (10 sec: 4914.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2187264. Throughput: 0: 935.2. Samples: 546810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:42:39,344][01629] Avg episode reward: [(0, '14.783')] [2024-10-03 23:42:44,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2199552. Throughput: 0: 921.8. Samples: 549234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:42:44,349][01629] Avg episode reward: [(0, '14.419')] [2024-10-03 23:42:47,252][03614] Updated weights for policy 0, policy_version 540 (0.0025) [2024-10-03 23:42:49,340][01629] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2220032. Throughput: 0: 921.5. Samples: 554158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:42:49,346][01629] Avg episode reward: [(0, '13.769')] [2024-10-03 23:42:54,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2244608. Throughput: 0: 990.9. Samples: 561174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:42:54,343][01629] Avg episode reward: [(0, '16.374')] [2024-10-03 23:42:55,843][03614] Updated weights for policy 0, policy_version 550 (0.0032) [2024-10-03 23:42:59,342][01629] Fps is (10 sec: 4095.3, 60 sec: 3754.9, 300 sec: 3776.6). Total num frames: 2260992. Throughput: 0: 1018.0. Samples: 564488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:42:59,345][01629] Avg episode reward: [(0, '16.379')] [2024-10-03 23:43:04,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.8). Total num frames: 2277376. Throughput: 0: 959.8. Samples: 568716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:43:04,345][01629] Avg episode reward: [(0, '16.242')] [2024-10-03 23:43:07,660][03614] Updated weights for policy 0, policy_version 560 (0.0023) [2024-10-03 23:43:09,340][01629] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2301952. Throughput: 0: 967.8. Samples: 575344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:43:09,349][01629] Avg episode reward: [(0, '17.341')] [2024-10-03 23:43:09,353][03601] Saving new best policy, reward=17.341! [2024-10-03 23:43:14,343][01629] Fps is (10 sec: 4504.3, 60 sec: 4027.6, 300 sec: 3790.5). Total num frames: 2322432. Throughput: 0: 999.2. Samples: 578900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:43:14,347][01629] Avg episode reward: [(0, '15.062')] [2024-10-03 23:43:18,495][03614] Updated weights for policy 0, policy_version 570 (0.0023) [2024-10-03 23:43:19,341][01629] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2334720. Throughput: 0: 972.3. Samples: 583730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:43:19,344][01629] Avg episode reward: [(0, '15.201')] [2024-10-03 23:43:24,340][01629] Fps is (10 sec: 3687.5, 60 sec: 3891.4, 300 sec: 3804.4). Total num frames: 2359296. Throughput: 0: 951.8. Samples: 589642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:43:24,347][01629] Avg episode reward: [(0, '15.322')] [2024-10-03 23:43:27,910][03614] Updated weights for policy 0, policy_version 580 (0.0031) [2024-10-03 23:43:29,340][01629] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 2379776. Throughput: 0: 976.5. Samples: 593176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:43:29,342][01629] Avg episode reward: [(0, '15.175')] [2024-10-03 23:43:34,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 2396160. Throughput: 0: 998.2. Samples: 599078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:43:34,344][01629] Avg episode reward: [(0, '15.919')] [2024-10-03 23:43:39,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2412544. Throughput: 0: 946.1. Samples: 603748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:43:39,348][01629] Avg episode reward: [(0, '16.899')] [2024-10-03 23:43:39,563][03614] Updated weights for policy 0, policy_version 590 (0.0037) [2024-10-03 23:43:44,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2437120. Throughput: 0: 950.5. Samples: 607258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:43:44,348][01629] Avg episode reward: [(0, '17.914')] [2024-10-03 23:43:44,361][03601] Saving new best policy, reward=17.914! [2024-10-03 23:43:48,679][03614] Updated weights for policy 0, policy_version 600 (0.0023) [2024-10-03 23:43:49,347][01629] Fps is (10 sec: 4502.6, 60 sec: 3959.0, 300 sec: 3776.6). Total num frames: 2457600. Throughput: 0: 1013.7. Samples: 614340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:43:49,349][01629] Avg episode reward: [(0, '18.754')] [2024-10-03 23:43:49,351][03601] Saving new best policy, reward=18.754! [2024-10-03 23:43:54,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2469888. Throughput: 0: 957.3. Samples: 618422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:43:54,346][01629] Avg episode reward: [(0, '18.718')] [2024-10-03 23:43:59,340][01629] Fps is (10 sec: 3688.9, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 2494464. Throughput: 0: 947.3. Samples: 621526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:43:59,347][01629] Avg episode reward: [(0, '18.841')] [2024-10-03 23:43:59,349][03601] Saving new best policy, reward=18.841! [2024-10-03 23:43:59,902][03614] Updated weights for policy 0, policy_version 610 (0.0019) [2024-10-03 23:44:04,340][01629] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 2519040. Throughput: 0: 996.7. Samples: 628582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:44:04,348][01629] Avg episode reward: [(0, '18.898')] [2024-10-03 23:44:04,359][03601] Saving new best policy, reward=18.898! [2024-10-03 23:44:09,340][01629] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2531328. Throughput: 0: 977.1. Samples: 633610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:44:09,345][01629] Avg episode reward: [(0, '19.092')] [2024-10-03 23:44:09,350][03601] Saving new best policy, reward=19.092! [2024-10-03 23:44:11,362][03614] Updated weights for policy 0, policy_version 620 (0.0033) [2024-10-03 23:44:14,344][01629] Fps is (10 sec: 3275.5, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2551808. Throughput: 0: 943.9. Samples: 635656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:44:14,351][01629] Avg episode reward: [(0, '18.995')] [2024-10-03 23:44:14,366][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000623_2551808.pth... [2024-10-03 23:44:14,498][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth [2024-10-03 23:44:19,340][01629] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2572288. Throughput: 0: 967.8. Samples: 642628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:44:19,345][01629] Avg episode reward: [(0, '18.591')] [2024-10-03 23:44:20,492][03614] Updated weights for policy 0, policy_version 630 (0.0024) [2024-10-03 23:44:24,340][01629] Fps is (10 sec: 4097.7, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2592768. Throughput: 0: 1001.1. Samples: 648796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:44:24,343][01629] Avg episode reward: [(0, '19.631')] [2024-10-03 23:44:24,356][03601] Saving new best policy, reward=19.631! [2024-10-03 23:44:29,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2609152. Throughput: 0: 966.6. Samples: 650754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:44:29,342][01629] Avg episode reward: [(0, '19.471')] [2024-10-03 23:44:31,842][03614] Updated weights for policy 0, policy_version 640 (0.0023) [2024-10-03 23:44:34,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2629632. Throughput: 0: 946.9. Samples: 656946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:44:34,342][01629] Avg episode reward: [(0, '20.614')] [2024-10-03 23:44:34,412][03601] Saving new best policy, reward=20.614! [2024-10-03 23:44:39,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2650112. Throughput: 0: 1005.9. Samples: 663686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-03 23:44:39,343][01629] Avg episode reward: [(0, '20.909')] [2024-10-03 23:44:39,348][03601] Saving new best policy, reward=20.909! [2024-10-03 23:44:42,421][03614] Updated weights for policy 0, policy_version 650 (0.0039) [2024-10-03 23:44:44,348][01629] Fps is (10 sec: 3683.6, 60 sec: 3822.4, 300 sec: 3790.4). Total num frames: 2666496. Throughput: 0: 981.8. Samples: 665716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:44:44,350][01629] Avg episode reward: [(0, '20.886')] [2024-10-03 23:44:49,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3823.4, 300 sec: 3818.3). Total num frames: 2686976. Throughput: 0: 939.8. Samples: 670874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:44:49,345][01629] Avg episode reward: [(0, '21.474')] [2024-10-03 23:44:49,349][03601] Saving new best policy, reward=21.474! [2024-10-03 23:44:54,341][01629] Fps is (10 sec: 3279.1, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2699264. Throughput: 0: 928.7. Samples: 675400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:44:54,346][01629] Avg episode reward: [(0, '21.349')] [2024-10-03 23:44:55,304][03614] Updated weights for policy 0, policy_version 660 (0.0049) [2024-10-03 23:44:59,340][01629] Fps is (10 sec: 2457.6, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2711552. Throughput: 0: 922.7. Samples: 677172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:44:59,345][01629] Avg episode reward: [(0, '22.102')] [2024-10-03 23:44:59,352][03601] Saving new best policy, reward=22.102! [2024-10-03 23:45:04,340][01629] Fps is (10 sec: 2048.1, 60 sec: 3345.1, 300 sec: 3721.1). Total num frames: 2719744. Throughput: 0: 838.0. Samples: 680340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:45:04,343][01629] Avg episode reward: [(0, '21.014')] [2024-10-03 23:45:09,340][01629] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 2740224. Throughput: 0: 814.2. Samples: 685436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:45:09,343][01629] Avg episode reward: [(0, '21.546')] [2024-10-03 23:45:09,738][03614] Updated weights for policy 0, policy_version 670 (0.0031) [2024-10-03 23:45:14,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3550.1, 300 sec: 3735.0). Total num frames: 2764800. Throughput: 0: 848.5. Samples: 688936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:45:14,344][01629] Avg episode reward: [(0, '20.641')] [2024-10-03 23:45:19,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3721.1). Total num frames: 2781184. Throughput: 0: 850.8. Samples: 695232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:45:19,345][01629] Avg episode reward: [(0, '20.360')] [2024-10-03 23:45:19,870][03614] Updated weights for policy 0, policy_version 680 (0.0029) [2024-10-03 23:45:24,340][01629] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3735.0). Total num frames: 2797568. Throughput: 0: 804.7. Samples: 699896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:45:24,343][01629] Avg episode reward: [(0, '22.001')] [2024-10-03 23:45:29,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 2822144. Throughput: 0: 837.9. Samples: 703416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:45:29,346][01629] Avg episode reward: [(0, '21.298')] [2024-10-03 23:45:29,869][03614] Updated weights for policy 0, policy_version 690 (0.0032) [2024-10-03 23:45:34,344][01629] Fps is (10 sec: 4504.0, 60 sec: 3549.6, 300 sec: 3735.0). Total num frames: 2842624. Throughput: 0: 878.1. Samples: 710392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:45:34,346][01629] Avg episode reward: [(0, '21.345')] [2024-10-03 23:45:39,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3721.2). Total num frames: 2854912. Throughput: 0: 871.6. Samples: 714622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:45:39,343][01629] Avg episode reward: [(0, '22.829')] [2024-10-03 23:45:39,351][03601] Saving new best policy, reward=22.829! [2024-10-03 23:45:41,720][03614] Updated weights for policy 0, policy_version 700 (0.0032) [2024-10-03 23:45:44,340][01629] Fps is (10 sec: 3687.8, 60 sec: 3550.3, 300 sec: 3735.0). Total num frames: 2879488. Throughput: 0: 893.7. Samples: 717388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:45:44,345][01629] Avg episode reward: [(0, '24.496')] [2024-10-03 23:45:44,359][03601] Saving new best policy, reward=24.496! [2024-10-03 23:45:49,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2899968. Throughput: 0: 977.8. Samples: 724340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:45:49,342][01629] Avg episode reward: [(0, '22.938')] [2024-10-03 23:45:50,403][03614] Updated weights for policy 0, policy_version 710 (0.0027) [2024-10-03 23:45:54,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3721.1). Total num frames: 2916352. Throughput: 0: 985.5. Samples: 729784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:45:54,345][01629] Avg episode reward: [(0, '23.384')] [2024-10-03 23:45:59,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2936832. Throughput: 0: 954.3. Samples: 731880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:45:59,348][01629] Avg episode reward: [(0, '22.643')] [2024-10-03 23:46:01,863][03614] Updated weights for policy 0, policy_version 720 (0.0020) [2024-10-03 23:46:04,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 2957312. Throughput: 0: 965.6. Samples: 738684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:46:04,346][01629] Avg episode reward: [(0, '23.477')] [2024-10-03 23:46:09,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 2977792. Throughput: 0: 1002.7. Samples: 745018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:46:09,343][01629] Avg episode reward: [(0, '22.892')] [2024-10-03 23:46:13,155][03614] Updated weights for policy 0, policy_version 730 (0.0026) [2024-10-03 23:46:14,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2990080. Throughput: 0: 968.5. Samples: 746998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:46:14,343][01629] Avg episode reward: [(0, '23.441')] [2024-10-03 23:46:14,358][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth... [2024-10-03 23:46:14,507][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000508_2080768.pth [2024-10-03 23:46:19,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3014656. Throughput: 0: 943.1. Samples: 752830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:46:19,348][01629] Avg episode reward: [(0, '25.419')] [2024-10-03 23:46:19,355][03601] Saving new best policy, reward=25.419! [2024-10-03 23:46:22,469][03614] Updated weights for policy 0, policy_version 740 (0.0029) [2024-10-03 23:46:24,341][01629] Fps is (10 sec: 4914.9, 60 sec: 4027.7, 300 sec: 3776.6). Total num frames: 3039232. Throughput: 0: 1002.6. Samples: 759738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:46:24,343][01629] Avg episode reward: [(0, '23.942')] [2024-10-03 23:46:29,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3051520. Throughput: 0: 995.2. Samples: 762174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:46:29,343][01629] Avg episode reward: [(0, '23.789')] [2024-10-03 23:46:33,953][03614] Updated weights for policy 0, policy_version 750 (0.0037) [2024-10-03 23:46:34,340][01629] Fps is (10 sec: 3277.0, 60 sec: 3823.2, 300 sec: 3776.7). Total num frames: 3072000. Throughput: 0: 949.1. Samples: 767050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:46:34,345][01629] Avg episode reward: [(0, '22.951')] [2024-10-03 23:46:39,340][01629] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3776.6). Total num frames: 3096576. Throughput: 0: 986.8. Samples: 774192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:46:39,347][01629] Avg episode reward: [(0, '22.134')] [2024-10-03 23:46:43,738][03614] Updated weights for policy 0, policy_version 760 (0.0028) [2024-10-03 23:46:44,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3112960. Throughput: 0: 1010.3. Samples: 777342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:46:44,344][01629] Avg episode reward: [(0, '22.313')] [2024-10-03 23:46:49,340][01629] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3129344. Throughput: 0: 954.2. Samples: 781622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:46:49,342][01629] Avg episode reward: [(0, '22.412')] [2024-10-03 23:46:53,959][03614] Updated weights for policy 0, policy_version 770 (0.0022) [2024-10-03 23:46:54,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3790.6). Total num frames: 3153920. Throughput: 0: 966.7. Samples: 788520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:46:54,343][01629] Avg episode reward: [(0, '22.350')] [2024-10-03 23:46:59,347][01629] Fps is (10 sec: 4502.6, 60 sec: 3959.0, 300 sec: 3804.3). Total num frames: 3174400. Throughput: 0: 1002.3. Samples: 792110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:46:59,352][01629] Avg episode reward: [(0, '22.313')] [2024-10-03 23:47:04,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3190784. Throughput: 0: 981.9. Samples: 797016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:47:04,344][01629] Avg episode reward: [(0, '23.335')] [2024-10-03 23:47:05,465][03614] Updated weights for policy 0, policy_version 780 (0.0037) [2024-10-03 23:47:09,340][01629] Fps is (10 sec: 3688.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3211264. Throughput: 0: 963.1. Samples: 803076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:47:09,343][01629] Avg episode reward: [(0, '22.213')] [2024-10-03 23:47:14,340][01629] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3231744. Throughput: 0: 984.9. Samples: 806496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:47:14,343][01629] Avg episode reward: [(0, '23.019')] [2024-10-03 23:47:14,419][03614] Updated weights for policy 0, policy_version 790 (0.0030) [2024-10-03 23:47:19,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.5). Total num frames: 3248128. Throughput: 0: 1005.1. Samples: 812280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:47:19,344][01629] Avg episode reward: [(0, '24.053')] [2024-10-03 23:47:24,340][01629] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 3268608. Throughput: 0: 958.6. Samples: 817328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:24,346][01629] Avg episode reward: [(0, '24.896')] [2024-10-03 23:47:25,736][03614] Updated weights for policy 0, policy_version 800 (0.0031) [2024-10-03 23:47:29,340][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3293184. Throughput: 0: 965.4. Samples: 820784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:29,346][01629] Avg episode reward: [(0, '24.829')] [2024-10-03 23:47:34,340][01629] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3309568. Throughput: 0: 1022.3. Samples: 827626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:47:34,345][01629] Avg episode reward: [(0, '24.721')] [2024-10-03 23:47:35,976][03614] Updated weights for policy 0, policy_version 810 (0.0038) [2024-10-03 23:47:39,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3325952. Throughput: 0: 964.5. Samples: 831922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:39,346][01629] Avg episode reward: [(0, '24.456')] [2024-10-03 23:47:44,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3342336. Throughput: 0: 950.2. Samples: 834862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:44,346][01629] Avg episode reward: [(0, '24.653')] [2024-10-03 23:47:49,143][03614] Updated weights for policy 0, policy_version 820 (0.0036) [2024-10-03 23:47:49,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3358720. Throughput: 0: 934.4. Samples: 839064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:49,345][01629] Avg episode reward: [(0, '23.951')] [2024-10-03 23:47:54,340][01629] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3371008. Throughput: 0: 897.2. Samples: 843452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:54,348][01629] Avg episode reward: [(0, '23.133')] [2024-10-03 23:47:59,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.5, 300 sec: 3776.7). Total num frames: 3391488. Throughput: 0: 873.3. Samples: 845796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:47:59,342][01629] Avg episode reward: [(0, '23.101')] [2024-10-03 23:48:00,573][03614] Updated weights for policy 0, policy_version 830 (0.0041) [2024-10-03 23:48:04,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 3416064. Throughput: 0: 901.1. Samples: 852830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:48:04,343][01629] Avg episode reward: [(0, '24.638')] [2024-10-03 23:48:09,342][01629] Fps is (10 sec: 4095.3, 60 sec: 3686.3, 300 sec: 3762.8). Total num frames: 3432448. Throughput: 0: 918.8. Samples: 858676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:48:09,346][01629] Avg episode reward: [(0, '26.452')] [2024-10-03 23:48:09,354][03601] Saving new best policy, reward=26.452! [2024-10-03 23:48:11,731][03614] Updated weights for policy 0, policy_version 840 (0.0049) [2024-10-03 23:48:14,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 3448832. Throughput: 0: 886.4. Samples: 860670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:48:14,345][01629] Avg episode reward: [(0, '24.756')] [2024-10-03 23:48:14,355][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000842_3448832.pth... [2024-10-03 23:48:14,482][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000623_2551808.pth [2024-10-03 23:48:19,340][01629] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3469312. Throughput: 0: 869.3. Samples: 866744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:48:19,347][01629] Avg episode reward: [(0, '23.024')] [2024-10-03 23:48:23,564][03614] Updated weights for policy 0, policy_version 850 (0.0045) [2024-10-03 23:48:24,346][01629] Fps is (10 sec: 3275.0, 60 sec: 3549.5, 300 sec: 3734.9). Total num frames: 3481600. Throughput: 0: 867.7. Samples: 870974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:48:24,355][01629] Avg episode reward: [(0, '22.891')] [2024-10-03 23:48:29,340][01629] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3721.1). Total num frames: 3493888. Throughput: 0: 843.8. Samples: 872834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:48:29,342][01629] Avg episode reward: [(0, '22.923')] [2024-10-03 23:48:34,340][01629] Fps is (10 sec: 3278.6, 60 sec: 3413.3, 300 sec: 3735.0). Total num frames: 3514368. Throughput: 0: 859.6. Samples: 877748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:48:34,347][01629] Avg episode reward: [(0, '22.745')] [2024-10-03 23:48:35,757][03614] Updated weights for policy 0, policy_version 860 (0.0034) [2024-10-03 23:48:39,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 3538944. Throughput: 0: 917.1. Samples: 884720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:48:39,348][01629] Avg episode reward: [(0, '22.220')] [2024-10-03 23:48:44,342][01629] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3721.2). Total num frames: 3555328. Throughput: 0: 934.8. Samples: 887864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:48:44,350][01629] Avg episode reward: [(0, '23.973')] [2024-10-03 23:48:46,825][03614] Updated weights for policy 0, policy_version 870 (0.0016) [2024-10-03 23:48:49,340][01629] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 3571712. Throughput: 0: 873.4. Samples: 892134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:48:49,342][01629] Avg episode reward: [(0, '24.273')] [2024-10-03 23:48:54,340][01629] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3596288. Throughput: 0: 894.7. Samples: 898936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:48:54,346][01629] Avg episode reward: [(0, '24.760')] [2024-10-03 23:48:56,112][03614] Updated weights for policy 0, policy_version 880 (0.0031) [2024-10-03 23:48:59,340][01629] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3616768. Throughput: 0: 930.2. Samples: 902528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:48:59,342][01629] Avg episode reward: [(0, '25.331')] [2024-10-03 23:49:04,344][01629] Fps is (10 sec: 3275.4, 60 sec: 3549.6, 300 sec: 3721.1). Total num frames: 3629056. Throughput: 0: 906.8. Samples: 907556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:49:04,351][01629] Avg episode reward: [(0, '25.365')] [2024-10-03 23:49:07,169][03614] Updated weights for policy 0, policy_version 890 (0.0022) [2024-10-03 23:49:09,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3735.0). Total num frames: 3653632. Throughput: 0: 948.5. Samples: 913652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:49:09,347][01629] Avg episode reward: [(0, '25.426')] [2024-10-03 23:49:14,340][01629] Fps is (10 sec: 4507.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3674112. Throughput: 0: 982.9. Samples: 917064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-03 23:49:14,346][01629] Avg episode reward: [(0, '25.304')] [2024-10-03 23:49:16,477][03614] Updated weights for policy 0, policy_version 900 (0.0042) [2024-10-03 23:49:19,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3690496. Throughput: 0: 1005.6. Samples: 923000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:49:19,343][01629] Avg episode reward: [(0, '26.609')] [2024-10-03 23:49:19,346][03601] Saving new best policy, reward=26.609! [2024-10-03 23:49:24,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3735.0). Total num frames: 3710976. Throughput: 0: 958.6. Samples: 927856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:49:24,342][01629] Avg episode reward: [(0, '26.974')] [2024-10-03 23:49:24,359][03601] Saving new best policy, reward=26.974! [2024-10-03 23:49:27,608][03614] Updated weights for policy 0, policy_version 910 (0.0022) [2024-10-03 23:49:29,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 3731456. Throughput: 0: 965.6. Samples: 931316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:49:29,343][01629] Avg episode reward: [(0, '27.357')] [2024-10-03 23:49:29,367][03601] Saving new best policy, reward=27.357! [2024-10-03 23:49:34,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 3751936. Throughput: 0: 1023.2. Samples: 938180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:49:34,345][01629] Avg episode reward: [(0, '27.854')] [2024-10-03 23:49:34,351][03601] Saving new best policy, reward=27.854! [2024-10-03 23:49:39,043][03614] Updated weights for policy 0, policy_version 920 (0.0034) [2024-10-03 23:49:39,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.1). Total num frames: 3768320. Throughput: 0: 967.6. Samples: 942480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:49:39,342][01629] Avg episode reward: [(0, '28.512')] [2024-10-03 23:49:39,350][03601] Saving new best policy, reward=28.512! [2024-10-03 23:49:44,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3735.0). Total num frames: 3788800. Throughput: 0: 956.1. Samples: 945552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:49:44,347][01629] Avg episode reward: [(0, '27.482')] [2024-10-03 23:49:47,864][03614] Updated weights for policy 0, policy_version 930 (0.0025) [2024-10-03 23:49:49,343][01629] Fps is (10 sec: 4504.2, 60 sec: 4027.5, 300 sec: 3776.6). Total num frames: 3813376. Throughput: 0: 1004.1. Samples: 952738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:49:49,346][01629] Avg episode reward: [(0, '26.365')] [2024-10-03 23:49:54,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3829760. Throughput: 0: 978.8. Samples: 957700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-03 23:49:54,347][01629] Avg episode reward: [(0, '25.488')] [2024-10-03 23:49:59,280][03614] Updated weights for policy 0, policy_version 940 (0.0030) [2024-10-03 23:49:59,340][01629] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3850240. Throughput: 0: 954.6. Samples: 960022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-03 23:49:59,346][01629] Avg episode reward: [(0, '24.352')] [2024-10-03 23:50:04,340][01629] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 3832.2). Total num frames: 3870720. Throughput: 0: 982.4. Samples: 967210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-03 23:50:04,343][01629] Avg episode reward: [(0, '23.041')] [2024-10-03 23:50:08,903][03614] Updated weights for policy 0, policy_version 950 (0.0017) [2024-10-03 23:50:09,340][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3891200. Throughput: 0: 1006.0. Samples: 973124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:50:09,343][01629] Avg episode reward: [(0, '23.919')] [2024-10-03 23:50:14,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3907584. Throughput: 0: 975.0. Samples: 975192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:50:14,342][01629] Avg episode reward: [(0, '23.420')] [2024-10-03 23:50:14,357][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000954_3907584.pth... [2024-10-03 23:50:14,500][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth [2024-10-03 23:50:19,340][01629] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3928064. Throughput: 0: 963.7. Samples: 981548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:50:19,351][01629] Avg episode reward: [(0, '23.225')] [2024-10-03 23:50:19,412][03614] Updated weights for policy 0, policy_version 960 (0.0033) [2024-10-03 23:50:24,340][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3952640. Throughput: 0: 1017.6. Samples: 988274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:50:24,342][01629] Avg episode reward: [(0, '22.767')] [2024-10-03 23:50:29,342][01629] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 3964928. Throughput: 0: 995.4. Samples: 990348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-03 23:50:29,345][01629] Avg episode reward: [(0, '22.303')] [2024-10-03 23:50:31,100][03614] Updated weights for policy 0, policy_version 970 (0.0018) [2024-10-03 23:50:34,340][01629] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3981312. Throughput: 0: 937.5. Samples: 994922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-03 23:50:34,345][01629] Avg episode reward: [(0, '23.371')] [2024-10-03 23:50:39,340][01629] Fps is (10 sec: 3277.5, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3997696. Throughput: 0: 934.3. Samples: 999744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-03 23:50:39,347][01629] Avg episode reward: [(0, '22.225')] [2024-10-03 23:50:41,142][03601] Stopping Batcher_0... [2024-10-03 23:50:41,143][03601] Loop batcher_evt_loop terminating... [2024-10-03 23:50:41,144][01629] Component Batcher_0 stopped! [2024-10-03 23:50:41,151][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-03 23:50:41,251][03614] Weights refcount: 2 0 [2024-10-03 23:50:41,259][01629] Component InferenceWorker_p0-w0 stopped! [2024-10-03 23:50:41,266][03614] Stopping InferenceWorker_p0-w0... [2024-10-03 23:50:41,266][03614] Loop inference_proc0-0_evt_loop terminating... [2024-10-03 23:50:41,391][03601] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000842_3448832.pth [2024-10-03 23:50:41,423][03601] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-03 23:50:41,659][03601] Stopping LearnerWorker_p0... [2024-10-03 23:50:41,661][03601] Loop learner_proc0_evt_loop terminating... [2024-10-03 23:50:41,663][01629] Component LearnerWorker_p0 stopped! [2024-10-03 23:50:41,737][01629] Component RolloutWorker_w1 stopped! [2024-10-03 23:50:41,742][03616] Stopping RolloutWorker_w1... [2024-10-03 23:50:41,742][03616] Loop rollout_proc1_evt_loop terminating... [2024-10-03 23:50:41,792][01629] Component RolloutWorker_w3 stopped! [2024-10-03 23:50:41,798][03618] Stopping RolloutWorker_w3... [2024-10-03 23:50:41,799][03618] Loop rollout_proc3_evt_loop terminating... [2024-10-03 23:50:41,811][01629] Component RolloutWorker_w7 stopped! [2024-10-03 23:50:41,817][03622] Stopping RolloutWorker_w7... [2024-10-03 23:50:41,817][03622] Loop rollout_proc7_evt_loop terminating... [2024-10-03 23:50:41,874][01629] Component RolloutWorker_w5 stopped! [2024-10-03 23:50:41,880][03620] Stopping RolloutWorker_w5... [2024-10-03 23:50:41,880][03620] Loop rollout_proc5_evt_loop terminating... [2024-10-03 23:50:41,996][03617] Stopping RolloutWorker_w2... [2024-10-03 23:50:41,996][01629] Component RolloutWorker_w2 stopped! [2024-10-03 23:50:41,997][03617] Loop rollout_proc2_evt_loop terminating... [2024-10-03 23:50:42,033][03615] Stopping RolloutWorker_w0... [2024-10-03 23:50:42,035][03615] Loop rollout_proc0_evt_loop terminating... [2024-10-03 23:50:42,031][01629] Component RolloutWorker_w0 stopped! [2024-10-03 23:50:42,050][03621] Stopping RolloutWorker_w6... [2024-10-03 23:50:42,050][01629] Component RolloutWorker_w6 stopped! [2024-10-03 23:50:42,051][03621] Loop rollout_proc6_evt_loop terminating... [2024-10-03 23:50:42,092][03619] Stopping RolloutWorker_w4... [2024-10-03 23:50:42,096][03619] Loop rollout_proc4_evt_loop terminating... [2024-10-03 23:50:42,092][01629] Component RolloutWorker_w4 stopped! [2024-10-03 23:50:42,099][01629] Waiting for process learner_proc0 to stop... [2024-10-03 23:50:43,922][01629] Waiting for process inference_proc0-0 to join... [2024-10-03 23:50:44,016][01629] Waiting for process rollout_proc0 to join... [2024-10-03 23:50:46,310][01629] Waiting for process rollout_proc1 to join... [2024-10-03 23:50:46,315][01629] Waiting for process rollout_proc2 to join... [2024-10-03 23:50:46,319][01629] Waiting for process rollout_proc3 to join... [2024-10-03 23:50:46,324][01629] Waiting for process rollout_proc4 to join... [2024-10-03 23:50:46,328][01629] Waiting for process rollout_proc5 to join... [2024-10-03 23:50:46,330][01629] Waiting for process rollout_proc6 to join... [2024-10-03 23:50:46,334][01629] Waiting for process rollout_proc7 to join... [2024-10-03 23:50:46,337][01629] Batcher 0 profile tree view: batching: 28.0110, releasing_batches: 0.0258 [2024-10-03 23:50:46,339][01629] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0034 wait_policy_total: 418.8501 update_model: 8.8562 weight_update: 0.0029 one_step: 0.0138 handle_policy_step: 607.6965 deserialize: 14.8694, stack: 3.1926, obs_to_device_normalize: 123.6712, forward: 323.3785, send_messages: 29.3409 prepare_outputs: 84.2407 to_cpu: 49.0477 [2024-10-03 23:50:46,340][01629] Learner 0 profile tree view: misc: 0.0060, prepare_batch: 14.1938 train: 74.7202 epoch_init: 0.0060, minibatch_init: 0.0068, losses_postprocess: 0.5935, kl_divergence: 0.6826, after_optimizer: 33.5343 calculate_losses: 26.7971 losses_init: 0.0086, forward_head: 1.2718, bptt_initial: 17.4856, tail: 1.2384, advantages_returns: 0.2532, losses: 4.1175 bptt: 2.1198 bptt_forward_core: 2.0076 update: 12.4252 clip: 0.9666 [2024-10-03 23:50:46,341][01629] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3545, enqueue_policy_requests: 101.9525, env_step: 838.4041, overhead: 13.1148, complete_rollouts: 8.2872 save_policy_outputs: 21.2515 split_output_tensors: 8.5897 [2024-10-03 23:50:46,344][01629] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4002, enqueue_policy_requests: 103.8871, env_step: 836.8004, overhead: 13.5128, complete_rollouts: 6.7115 save_policy_outputs: 20.6950 split_output_tensors: 8.3762 [2024-10-03 23:50:46,345][01629] Loop Runner_EvtLoop terminating... [2024-10-03 23:50:46,347][01629] Runner profile tree view: main_loop: 1106.7857 [2024-10-03 23:50:46,348][01629] Collected {0: 4005888}, FPS: 3619.4 [2024-10-03 23:50:46,387][01629] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-03 23:50:46,388][01629] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-03 23:50:46,389][01629] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-03 23:50:46,391][01629] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-03 23:50:46,392][01629] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-03 23:50:46,393][01629] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-03 23:50:46,394][01629] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-03 23:50:46,395][01629] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-03 23:50:46,396][01629] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-03 23:50:46,397][01629] Adding new argument 'hf_repository'='seangogo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-03 23:50:46,398][01629] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-03 23:50:46,399][01629] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-03 23:50:46,400][01629] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-03 23:50:46,401][01629] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-03 23:50:46,403][01629] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-03 23:50:46,443][01629] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-03 23:50:46,447][01629] RunningMeanStd input shape: (3, 72, 128) [2024-10-03 23:50:46,449][01629] RunningMeanStd input shape: (1,) [2024-10-03 23:50:46,466][01629] ConvEncoder: input_channels=3 [2024-10-03 23:50:46,569][01629] Conv encoder output size: 512 [2024-10-03 23:50:46,571][01629] Policy head output size: 512 [2024-10-03 23:50:46,750][01629] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-03 23:50:47,574][01629] Num frames 100... [2024-10-03 23:50:47,699][01629] Num frames 200... [2024-10-03 23:50:47,820][01629] Num frames 300... [2024-10-03 23:50:47,942][01629] Num frames 400... [2024-10-03 23:50:48,065][01629] Num frames 500... [2024-10-03 23:50:48,195][01629] Num frames 600... [2024-10-03 23:50:48,327][01629] Num frames 700... [2024-10-03 23:50:48,450][01629] Num frames 800... [2024-10-03 23:50:48,574][01629] Num frames 900... [2024-10-03 23:50:48,702][01629] Num frames 1000... [2024-10-03 23:50:48,826][01629] Num frames 1100... [2024-10-03 23:50:48,952][01629] Num frames 1200... [2024-10-03 23:50:49,073][01629] Num frames 1300... [2024-10-03 23:50:49,207][01629] Num frames 1400... [2024-10-03 23:50:49,340][01629] Num frames 1500... [2024-10-03 23:50:49,469][01629] Num frames 1600... [2024-10-03 23:50:49,601][01629] Num frames 1700... [2024-10-03 23:50:49,730][01629] Avg episode rewards: #0: 49.599, true rewards: #0: 17.600 [2024-10-03 23:50:49,731][01629] Avg episode reward: 49.599, avg true_objective: 17.600 [2024-10-03 23:50:49,785][01629] Num frames 1800... [2024-10-03 23:50:49,909][01629] Num frames 1900... [2024-10-03 23:50:50,031][01629] Num frames 2000... [2024-10-03 23:50:50,156][01629] Num frames 2100... [2024-10-03 23:50:50,292][01629] Num frames 2200... [2024-10-03 23:50:50,412][01629] Num frames 2300... [2024-10-03 23:50:50,538][01629] Num frames 2400... [2024-10-03 23:50:50,662][01629] Num frames 2500... [2024-10-03 23:50:50,786][01629] Num frames 2600... [2024-10-03 23:50:50,913][01629] Num frames 2700... [2024-10-03 23:50:51,039][01629] Num frames 2800... [2024-10-03 23:50:51,161][01629] Num frames 2900... [2024-10-03 23:50:51,294][01629] Num frames 3000... [2024-10-03 23:50:51,421][01629] Num frames 3100... [2024-10-03 23:50:51,542][01629] Num frames 3200... [2024-10-03 23:50:51,667][01629] Num frames 3300... [2024-10-03 23:50:51,791][01629] Num frames 3400... [2024-10-03 23:50:51,915][01629] Num frames 3500... [2024-10-03 23:50:52,040][01629] Num frames 3600... [2024-10-03 23:50:52,106][01629] Avg episode rewards: #0: 49.039, true rewards: #0: 18.040 [2024-10-03 23:50:52,107][01629] Avg episode reward: 49.039, avg true_objective: 18.040 [2024-10-03 23:50:52,230][01629] Num frames 3700... [2024-10-03 23:50:52,361][01629] Num frames 3800... [2024-10-03 23:50:52,486][01629] Num frames 3900... [2024-10-03 23:50:52,607][01629] Num frames 4000... [2024-10-03 23:50:52,726][01629] Num frames 4100... [2024-10-03 23:50:52,848][01629] Num frames 4200... [2024-10-03 23:50:52,977][01629] Num frames 4300... [2024-10-03 23:50:53,103][01629] Num frames 4400... [2024-10-03 23:50:53,231][01629] Num frames 4500... [2024-10-03 23:50:53,361][01629] Num frames 4600... [2024-10-03 23:50:53,487][01629] Num frames 4700... [2024-10-03 23:50:53,611][01629] Num frames 4800... [2024-10-03 23:50:53,736][01629] Num frames 4900... [2024-10-03 23:50:53,856][01629] Num frames 5000... [2024-10-03 23:50:53,978][01629] Num frames 5100... [2024-10-03 23:50:54,102][01629] Num frames 5200... [2024-10-03 23:50:54,231][01629] Num frames 5300... [2024-10-03 23:50:54,352][01629] Num frames 5400... [2024-10-03 23:50:54,482][01629] Num frames 5500... [2024-10-03 23:50:54,614][01629] Num frames 5600... [2024-10-03 23:50:54,781][01629] Num frames 5700... [2024-10-03 23:50:54,852][01629] Avg episode rewards: #0: 52.026, true rewards: #0: 19.027 [2024-10-03 23:50:54,855][01629] Avg episode reward: 52.026, avg true_objective: 19.027 [2024-10-03 23:50:55,009][01629] Num frames 5800... [2024-10-03 23:50:55,187][01629] Num frames 5900... [2024-10-03 23:50:55,359][01629] Num frames 6000... [2024-10-03 23:50:55,538][01629] Num frames 6100... [2024-10-03 23:50:55,700][01629] Num frames 6200... [2024-10-03 23:50:55,868][01629] Num frames 6300... [2024-10-03 23:50:56,047][01629] Num frames 6400... [2024-10-03 23:50:56,227][01629] Num frames 6500... [2024-10-03 23:50:56,400][01629] Num frames 6600... [2024-10-03 23:50:56,582][01629] Num frames 6700... [2024-10-03 23:50:56,762][01629] Num frames 6800... [2024-10-03 23:50:56,940][01629] Num frames 6900... [2024-10-03 23:50:57,079][01629] Num frames 7000... [2024-10-03 23:50:57,205][01629] Num frames 7100... [2024-10-03 23:50:57,327][01629] Num frames 7200... [2024-10-03 23:50:57,451][01629] Num frames 7300... [2024-10-03 23:50:57,582][01629] Num frames 7400... [2024-10-03 23:50:57,729][01629] Avg episode rewards: #0: 51.927, true rewards: #0: 18.678 [2024-10-03 23:50:57,731][01629] Avg episode reward: 51.927, avg true_objective: 18.678 [2024-10-03 23:50:57,768][01629] Num frames 7500... [2024-10-03 23:50:57,888][01629] Num frames 7600... [2024-10-03 23:50:58,014][01629] Num frames 7700... [2024-10-03 23:50:58,136][01629] Num frames 7800... [2024-10-03 23:50:58,264][01629] Num frames 7900... [2024-10-03 23:50:58,386][01629] Num frames 8000... [2024-10-03 23:50:58,509][01629] Num frames 8100... [2024-10-03 23:50:58,638][01629] Num frames 8200... [2024-10-03 23:50:58,761][01629] Num frames 8300... [2024-10-03 23:50:58,861][01629] Avg episode rewards: #0: 45.069, true rewards: #0: 16.670 [2024-10-03 23:50:58,862][01629] Avg episode reward: 45.069, avg true_objective: 16.670 [2024-10-03 23:50:58,942][01629] Num frames 8400... [2024-10-03 23:50:59,076][01629] Num frames 8500... [2024-10-03 23:50:59,203][01629] Num frames 8600... [2024-10-03 23:50:59,327][01629] Num frames 8700... [2024-10-03 23:50:59,449][01629] Num frames 8800... [2024-10-03 23:50:59,577][01629] Num frames 8900... [2024-10-03 23:50:59,699][01629] Num frames 9000... [2024-10-03 23:50:59,822][01629] Num frames 9100... [2024-10-03 23:50:59,945][01629] Num frames 9200... [2024-10-03 23:51:00,075][01629] Num frames 9300... [2024-10-03 23:51:00,206][01629] Num frames 9400... [2024-10-03 23:51:00,330][01629] Num frames 9500... [2024-10-03 23:51:00,451][01629] Num frames 9600... [2024-10-03 23:51:00,571][01629] Num frames 9700... [2024-10-03 23:51:00,700][01629] Num frames 9800... [2024-10-03 23:51:00,817][01629] Num frames 9900... [2024-10-03 23:51:00,936][01629] Num frames 10000... [2024-10-03 23:51:01,059][01629] Num frames 10100... [2024-10-03 23:51:01,185][01629] Num frames 10200... [2024-10-03 23:51:01,348][01629] Avg episode rewards: #0: 45.144, true rewards: #0: 17.145 [2024-10-03 23:51:01,349][01629] Avg episode reward: 45.144, avg true_objective: 17.145 [2024-10-03 23:51:01,370][01629] Num frames 10300... [2024-10-03 23:51:01,494][01629] Num frames 10400... [2024-10-03 23:51:01,618][01629] Num frames 10500... [2024-10-03 23:51:01,744][01629] Num frames 10600... [2024-10-03 23:51:01,866][01629] Num frames 10700... [2024-10-03 23:51:01,986][01629] Num frames 10800... [2024-10-03 23:51:02,109][01629] Num frames 10900... [2024-10-03 23:51:02,246][01629] Num frames 11000... [2024-10-03 23:51:02,369][01629] Num frames 11100... [2024-10-03 23:51:02,492][01629] Num frames 11200... [2024-10-03 23:51:02,616][01629] Num frames 11300... [2024-10-03 23:51:02,748][01629] Num frames 11400... [2024-10-03 23:51:02,916][01629] Avg episode rewards: #0: 42.419, true rewards: #0: 16.420 [2024-10-03 23:51:02,919][01629] Avg episode reward: 42.419, avg true_objective: 16.420 [2024-10-03 23:51:02,929][01629] Num frames 11500... [2024-10-03 23:51:03,055][01629] Num frames 11600... [2024-10-03 23:51:03,179][01629] Num frames 11700... [2024-10-03 23:51:03,303][01629] Num frames 11800... [2024-10-03 23:51:03,423][01629] Num frames 11900... [2024-10-03 23:51:03,546][01629] Num frames 12000... [2024-10-03 23:51:03,678][01629] Num frames 12100... [2024-10-03 23:51:03,800][01629] Num frames 12200... [2024-10-03 23:51:03,920][01629] Num frames 12300... [2024-10-03 23:51:04,049][01629] Num frames 12400... [2024-10-03 23:51:04,171][01629] Num frames 12500... [2024-10-03 23:51:04,296][01629] Num frames 12600... [2024-10-03 23:51:04,419][01629] Num frames 12700... [2024-10-03 23:51:04,540][01629] Num frames 12800... [2024-10-03 23:51:04,664][01629] Num frames 12900... [2024-10-03 23:51:04,808][01629] Avg episode rewards: #0: 41.582, true rewards: #0: 16.208 [2024-10-03 23:51:04,810][01629] Avg episode reward: 41.582, avg true_objective: 16.208 [2024-10-03 23:51:04,854][01629] Num frames 13000... [2024-10-03 23:51:04,975][01629] Num frames 13100... [2024-10-03 23:51:05,100][01629] Num frames 13200... [2024-10-03 23:51:05,229][01629] Num frames 13300... [2024-10-03 23:51:05,348][01629] Num frames 13400... [2024-10-03 23:51:05,471][01629] Num frames 13500... [2024-10-03 23:51:05,540][01629] Avg episode rewards: #0: 37.900, true rewards: #0: 15.011 [2024-10-03 23:51:05,541][01629] Avg episode reward: 37.900, avg true_objective: 15.011 [2024-10-03 23:51:05,649][01629] Num frames 13600... [2024-10-03 23:51:05,779][01629] Num frames 13700... [2024-10-03 23:51:05,904][01629] Num frames 13800... [2024-10-03 23:51:06,025][01629] Num frames 13900... [2024-10-03 23:51:06,152][01629] Num frames 14000... [2024-10-03 23:51:06,280][01629] Num frames 14100... [2024-10-03 23:51:06,402][01629] Num frames 14200... [2024-10-03 23:51:06,520][01629] Num frames 14300... [2024-10-03 23:51:06,641][01629] Num frames 14400... [2024-10-03 23:51:06,774][01629] Num frames 14500... [2024-10-03 23:51:06,894][01629] Num frames 14600... [2024-10-03 23:51:06,963][01629] Avg episode rewards: #0: 37.110, true rewards: #0: 14.610 [2024-10-03 23:51:06,964][01629] Avg episode reward: 37.110, avg true_objective: 14.610 [2024-10-03 23:52:36,587][01629] Replay video saved to /content/train_dir/default_experiment/replay.mp4!