[2024-09-01 14:50:18,637][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 14:50:18,643][00194] Rollout worker 0 uses device cpu [2024-09-01 14:50:18,645][00194] Rollout worker 1 uses device cpu [2024-09-01 14:50:18,646][00194] Rollout worker 2 uses device cpu [2024-09-01 14:50:18,648][00194] Rollout worker 3 uses device cpu [2024-09-01 14:50:18,649][00194] Rollout worker 4 uses device cpu [2024-09-01 14:50:18,651][00194] Rollout worker 5 uses device cpu [2024-09-01 14:50:18,653][00194] Rollout worker 6 uses device cpu [2024-09-01 14:50:18,654][00194] Rollout worker 7 uses device cpu [2024-09-01 14:50:18,826][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 14:50:18,874][00194] Starting all processes... [2024-09-01 14:50:18,879][00194] Starting process learner_proc0 [2024-09-01 14:50:18,932][00194] Starting all processes... [2024-09-01 14:50:18,945][00194] Starting process inference_proc0-0 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc0 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc1 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc2 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc3 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc4 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc5 [2024-09-01 14:50:18,947][00194] Starting process rollout_proc6 [2024-09-01 14:50:18,947][00194] Starting process rollout_proc7 [2024-09-01 14:50:32,730][03021] Starting seed is not provided [2024-09-01 14:50:32,732][03021] Initializing actor-critic model on device cpu [2024-09-01 14:50:32,733][03021] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 14:50:32,735][03021] RunningMeanStd input shape: (1,) [2024-09-01 14:50:32,820][03021] ConvEncoder: input_channels=3 [2024-09-01 14:50:33,363][03035] Worker 0 uses CPU cores [0] [2024-09-01 14:50:33,505][03042] Worker 7 uses CPU cores [1] [2024-09-01 14:50:33,519][03038] Worker 3 uses CPU cores [1] [2024-09-01 14:50:33,572][03041] Worker 6 uses CPU cores [0] [2024-09-01 14:50:33,653][03039] Worker 4 uses CPU cores [0] [2024-09-01 14:50:33,669][03037] Worker 2 uses CPU cores [0] [2024-09-01 14:50:33,694][03021] Conv encoder output size: 512 [2024-09-01 14:50:33,696][03021] Policy head output size: 512 [2024-09-01 14:50:33,724][03021] Created Actor Critic model with architecture: [2024-09-01 14:50:33,728][03036] Worker 1 uses CPU cores [1] [2024-09-01 14:50:33,726][03021] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 14:50:33,777][03040] Worker 5 uses CPU cores [1] [2024-09-01 14:50:34,292][03021] Using optimizer [2024-09-01 14:50:34,293][03021] No checkpoints found [2024-09-01 14:50:34,294][03021] Did not load from checkpoint, starting from scratch! [2024-09-01 14:50:34,294][03021] Initialized policy 0 weights for model version 0 [2024-09-01 14:50:34,297][03021] LearnerWorker_p0 finished initialization! [2024-09-01 14:50:34,305][03034] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 14:50:34,307][03034] RunningMeanStd input shape: (1,) [2024-09-01 14:50:34,333][03034] ConvEncoder: input_channels=3 [2024-09-01 14:50:34,490][03034] Conv encoder output size: 512 [2024-09-01 14:50:34,490][03034] Policy head output size: 512 [2024-09-01 14:50:34,512][00194] Inference worker 0-0 is ready! [2024-09-01 14:50:34,514][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 14:50:34,598][03038] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,599][03040] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,601][03042] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,597][03036] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,613][03035] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,610][03039] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,625][03037] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,627][03041] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:35,136][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:35,670][03037] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,360][03038] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,356][03040] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,363][03042] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,364][03036] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,675][03037] Decorrelating experience for 32 frames... [2024-09-01 14:50:36,762][03041] Decorrelating experience for 0 frames... [2024-09-01 14:50:37,247][03038] Decorrelating experience for 32 frames... [2024-09-01 14:50:37,249][03036] Decorrelating experience for 32 frames... [2024-09-01 14:50:37,842][03042] Decorrelating experience for 32 frames... [2024-09-01 14:50:37,948][03039] Decorrelating experience for 0 frames... [2024-09-01 14:50:38,398][03041] Decorrelating experience for 32 frames... [2024-09-01 14:50:38,595][03037] Decorrelating experience for 64 frames... [2024-09-01 14:50:38,817][00194] Heartbeat connected on Batcher_0 [2024-09-01 14:50:38,824][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 14:50:38,886][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 14:50:39,168][03036] Decorrelating experience for 64 frames... [2024-09-01 14:50:39,403][03035] Decorrelating experience for 0 frames... [2024-09-01 14:50:39,515][03042] Decorrelating experience for 64 frames... [2024-09-01 14:50:40,136][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:40,414][03039] Decorrelating experience for 32 frames... [2024-09-01 14:50:40,550][03038] Decorrelating experience for 64 frames... [2024-09-01 14:50:41,222][03037] Decorrelating experience for 96 frames... [2024-09-01 14:50:41,279][03040] Decorrelating experience for 32 frames... [2024-09-01 14:50:41,444][03041] Decorrelating experience for 64 frames... [2024-09-01 14:50:41,490][03042] Decorrelating experience for 96 frames... [2024-09-01 14:50:41,575][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 14:50:41,671][03035] Decorrelating experience for 32 frames... [2024-09-01 14:50:41,777][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 14:50:42,279][03038] Decorrelating experience for 96 frames... [2024-09-01 14:50:42,726][03039] Decorrelating experience for 64 frames... [2024-09-01 14:50:42,873][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 14:50:43,857][03036] Decorrelating experience for 96 frames... [2024-09-01 14:50:44,758][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 14:50:45,136][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 39.4. Samples: 394. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:45,144][00194] Avg episode reward: [(0, '1.813')] [2024-09-01 14:50:46,376][03041] Decorrelating experience for 96 frames... [2024-09-01 14:50:46,844][03039] Decorrelating experience for 96 frames... [2024-09-01 14:50:47,585][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 14:50:48,589][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 14:50:48,663][03040] Decorrelating experience for 64 frames... [2024-09-01 14:50:50,136][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 104.9. Samples: 1574. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:50,140][00194] Avg episode reward: [(0, '2.296')] [2024-09-01 14:50:52,661][03035] Decorrelating experience for 64 frames... [2024-09-01 14:50:53,342][03021] Signal inference workers to stop experience collection... [2024-09-01 14:50:53,395][03034] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 14:50:53,479][03040] Decorrelating experience for 96 frames... [2024-09-01 14:50:53,575][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 14:50:53,977][03035] Decorrelating experience for 96 frames... [2024-09-01 14:50:54,080][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 14:50:54,301][03021] Signal inference workers to resume experience collection... [2024-09-01 14:50:54,302][03034] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 14:50:55,136][00194] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 109.6. Samples: 2192. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 14:50:55,138][00194] Avg episode reward: [(0, '2.547')] [2024-09-01 14:51:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 8192. Throughput: 0: 141.1. Samples: 3528. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 14:51:00,142][00194] Avg episode reward: [(0, '3.212')] [2024-09-01 14:51:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 12288. Throughput: 0: 164.7. Samples: 4940. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:05,139][00194] Avg episode reward: [(0, '3.274')] [2024-09-01 14:51:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 16384. Throughput: 0: 159.8. Samples: 5592. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:10,141][00194] Avg episode reward: [(0, '3.525')] [2024-09-01 14:51:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 20480. Throughput: 0: 174.2. Samples: 6970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:15,144][00194] Avg episode reward: [(0, '3.818')] [2024-09-01 14:51:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 24576. Throughput: 0: 192.3. Samples: 8654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:20,140][00194] Avg episode reward: [(0, '3.829')] [2024-09-01 14:51:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 28672. Throughput: 0: 201.0. Samples: 9046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:25,145][00194] Avg episode reward: [(0, '3.873')] [2024-09-01 14:51:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 670.3, 300 sec: 670.3). Total num frames: 36864. Throughput: 0: 223.3. Samples: 10442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:30,138][00194] Avg episode reward: [(0, '3.971')] [2024-09-01 14:51:34,366][03034] Updated weights for policy 0, policy_version 10 (0.2578) [2024-09-01 14:51:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 40960. Throughput: 0: 229.3. Samples: 11892. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:35,139][00194] Avg episode reward: [(0, '4.138')] [2024-09-01 14:51:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 693.2). Total num frames: 45056. Throughput: 0: 236.0. Samples: 12814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:40,141][00194] Avg episode reward: [(0, '4.322')] [2024-09-01 14:51:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 49152. Throughput: 0: 228.5. Samples: 13812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:45,139][00194] Avg episode reward: [(0, '4.340')] [2024-09-01 14:51:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 710.0). Total num frames: 53248. Throughput: 0: 229.9. Samples: 15286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:50,145][00194] Avg episode reward: [(0, '4.382')] [2024-09-01 14:51:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 716.8). Total num frames: 57344. Throughput: 0: 235.4. Samples: 16184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:55,142][00194] Avg episode reward: [(0, '4.435')] [2024-09-01 14:52:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 722.8). Total num frames: 61440. Throughput: 0: 232.4. Samples: 17426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:00,143][00194] Avg episode reward: [(0, '4.432')] [2024-09-01 14:52:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 65536. Throughput: 0: 229.6. Samples: 18986. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:05,147][00194] Avg episode reward: [(0, '4.425')] [2024-09-01 14:52:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 776.1). Total num frames: 73728. Throughput: 0: 236.8. Samples: 19700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:10,138][00194] Avg episode reward: [(0, '4.491')] [2024-09-01 14:52:15,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 778.2). Total num frames: 77824. Throughput: 0: 235.0. Samples: 21018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:15,140][00194] Avg episode reward: [(0, '4.481')] [2024-09-01 14:52:19,754][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000020_81920.pth... [2024-09-01 14:52:19,758][03034] Updated weights for policy 0, policy_version 20 (0.0527) [2024-09-01 14:52:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 780.2). Total num frames: 81920. Throughput: 0: 225.6. Samples: 22042. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:20,138][00194] Avg episode reward: [(0, '4.519')] [2024-09-01 14:52:25,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 782.0). Total num frames: 86016. Throughput: 0: 227.3. Samples: 23044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:25,139][00194] Avg episode reward: [(0, '4.486')] [2024-09-01 14:52:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 783.6). Total num frames: 90112. Throughput: 0: 240.9. Samples: 24652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:30,141][00194] Avg episode reward: [(0, '4.496')] [2024-09-01 14:52:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 785.1). Total num frames: 94208. Throughput: 0: 233.6. Samples: 25800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:35,141][00194] Avg episode reward: [(0, '4.528')] [2024-09-01 14:52:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 786.4). Total num frames: 98304. Throughput: 0: 223.5. Samples: 26242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:40,145][00194] Avg episode reward: [(0, '4.456')] [2024-09-01 14:52:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 106496. Throughput: 0: 238.5. Samples: 28160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:45,151][00194] Avg episode reward: [(0, '4.433')] [2024-09-01 14:52:49,237][03021] Saving new best policy, reward=4.433! [2024-09-01 14:52:50,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 110592. Throughput: 0: 228.7. Samples: 29280. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:50,145][00194] Avg episode reward: [(0, '4.521')] [2024-09-01 14:52:54,932][03021] Saving new best policy, reward=4.521! [2024-09-01 14:52:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 114688. Throughput: 0: 228.9. Samples: 30000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:55,143][00194] Avg episode reward: [(0, '4.529')] [2024-09-01 14:52:58,673][03021] Saving new best policy, reward=4.529! [2024-09-01 14:53:00,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 118784. Throughput: 0: 227.0. Samples: 31234. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:00,138][00194] Avg episode reward: [(0, '4.515')] [2024-09-01 14:53:02,571][03034] Updated weights for policy 0, policy_version 30 (0.0582) [2024-09-01 14:53:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 122880. Throughput: 0: 246.7. Samples: 33144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:05,141][00194] Avg episode reward: [(0, '4.548')] [2024-09-01 14:53:10,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 126976. Throughput: 0: 231.1. Samples: 33446. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:10,146][00194] Avg episode reward: [(0, '4.506')] [2024-09-01 14:53:12,627][03021] Saving new best policy, reward=4.548! [2024-09-01 14:53:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 131072. Throughput: 0: 222.7. Samples: 34672. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:15,144][00194] Avg episode reward: [(0, '4.561')] [2024-09-01 14:53:20,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 135168. Throughput: 0: 236.1. Samples: 36424. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:20,143][00194] Avg episode reward: [(0, '4.552')] [2024-09-01 14:53:20,380][03021] Saving new best policy, reward=4.561! [2024-09-01 14:53:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 843.3). Total num frames: 143360. Throughput: 0: 247.9. Samples: 37396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:25,139][00194] Avg episode reward: [(0, '4.575')] [2024-09-01 14:53:30,140][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 143360. Throughput: 0: 212.9. Samples: 37740. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:30,158][00194] Avg episode reward: [(0, '4.604')] [2024-09-01 14:53:33,783][03021] Saving new best policy, reward=4.575! [2024-09-01 14:53:33,922][03021] Saving new best policy, reward=4.604! [2024-09-01 14:53:35,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 147456. Throughput: 0: 210.5. Samples: 38752. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:35,142][00194] Avg episode reward: [(0, '4.503')] [2024-09-01 14:53:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 151552. Throughput: 0: 211.1. Samples: 39500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:40,139][00194] Avg episode reward: [(0, '4.544')] [2024-09-01 14:53:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 155648. Throughput: 0: 215.8. Samples: 40944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:45,146][00194] Avg episode reward: [(0, '4.456')] [2024-09-01 14:53:50,142][00194] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 159744. Throughput: 0: 196.4. Samples: 41982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:53:50,146][00194] Avg episode reward: [(0, '4.465')] [2024-09-01 14:53:52,315][03034] Updated weights for policy 0, policy_version 40 (0.1638) [2024-09-01 14:53:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 163840. Throughput: 0: 204.9. Samples: 42668. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:53:55,143][00194] Avg episode reward: [(0, '4.601')] [2024-09-01 14:54:00,136][00194] Fps is (10 sec: 1229.6, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 172032. Throughput: 0: 220.0. Samples: 44574. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:54:00,142][00194] Avg episode reward: [(0, '4.532')] [2024-09-01 14:54:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 172032. Throughput: 0: 203.4. Samples: 45576. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:54:05,138][00194] Avg episode reward: [(0, '4.487')] [2024-09-01 14:54:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.3). Total num frames: 180224. Throughput: 0: 194.0. Samples: 46128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:10,146][00194] Avg episode reward: [(0, '4.467')] [2024-09-01 14:54:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 837.8). Total num frames: 184320. Throughput: 0: 219.1. Samples: 47598. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:15,143][00194] Avg episode reward: [(0, '4.396')] [2024-09-01 14:54:17,607][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000046_188416.pth... [2024-09-01 14:54:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 837.4). Total num frames: 188416. Throughput: 0: 231.2. Samples: 49154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:20,143][00194] Avg episode reward: [(0, '4.467')] [2024-09-01 14:54:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 837.0). Total num frames: 192512. Throughput: 0: 223.1. Samples: 49540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:25,144][00194] Avg episode reward: [(0, '4.449')] [2024-09-01 14:54:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 836.6). Total num frames: 196608. Throughput: 0: 216.7. Samples: 50696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:30,139][00194] Avg episode reward: [(0, '4.457')] [2024-09-01 14:54:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 836.3). Total num frames: 200704. Throughput: 0: 236.7. Samples: 52630. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:35,139][00194] Avg episode reward: [(0, '4.421')] [2024-09-01 14:54:36,014][03034] Updated weights for policy 0, policy_version 50 (0.1682) [2024-09-01 14:54:40,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 835.9). Total num frames: 204800. Throughput: 0: 229.6. Samples: 53002. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:40,148][00194] Avg episode reward: [(0, '4.315')] [2024-09-01 14:54:40,302][03021] Signal inference workers to stop experience collection... (50 times) [2024-09-01 14:54:40,427][03034] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 14:54:41,141][03021] Signal inference workers to resume experience collection... (50 times) [2024-09-01 14:54:41,142][03034] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 14:54:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 835.6). Total num frames: 208896. Throughput: 0: 215.8. Samples: 54286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:45,141][00194] Avg episode reward: [(0, '4.328')] [2024-09-01 14:54:50,136][00194] Fps is (10 sec: 1229.6, 60 sec: 955.8, 300 sec: 851.3). Total num frames: 217088. Throughput: 0: 225.6. Samples: 55728. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:50,140][00194] Avg episode reward: [(0, '4.300')] [2024-09-01 14:54:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 850.7). Total num frames: 221184. Throughput: 0: 235.6. Samples: 56730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:55,144][00194] Avg episode reward: [(0, '4.263')] [2024-09-01 14:55:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.1). Total num frames: 225280. Throughput: 0: 225.9. Samples: 57762. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:00,140][00194] Avg episode reward: [(0, '4.302')] [2024-09-01 14:55:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 849.5). Total num frames: 229376. Throughput: 0: 222.1. Samples: 59150. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:05,138][00194] Avg episode reward: [(0, '4.240')] [2024-09-01 14:55:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 849.0). Total num frames: 233472. Throughput: 0: 229.4. Samples: 59864. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:10,142][00194] Avg episode reward: [(0, '4.327')] [2024-09-01 14:55:15,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 848.5). Total num frames: 237568. Throughput: 0: 239.0. Samples: 61452. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:15,140][00194] Avg episode reward: [(0, '4.340')] [2024-09-01 14:55:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.9). Total num frames: 241664. Throughput: 0: 220.1. Samples: 62536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:20,139][00194] Avg episode reward: [(0, '4.356')] [2024-09-01 14:55:21,561][03034] Updated weights for policy 0, policy_version 60 (0.0050) [2024-09-01 14:55:25,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.4). Total num frames: 245760. Throughput: 0: 229.6. Samples: 63334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:25,145][00194] Avg episode reward: [(0, '4.346')] [2024-09-01 14:55:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 860.9). Total num frames: 253952. Throughput: 0: 236.7. Samples: 64938. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:30,139][00194] Avg episode reward: [(0, '4.353')] [2024-09-01 14:55:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 258048. Throughput: 0: 228.0. Samples: 65990. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:35,139][00194] Avg episode reward: [(0, '4.448')] [2024-09-01 14:55:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 888.6). Total num frames: 262144. Throughput: 0: 219.6. Samples: 66614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:40,140][00194] Avg episode reward: [(0, '4.677')] [2024-09-01 14:55:43,065][03021] Saving new best policy, reward=4.677! [2024-09-01 14:55:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 266240. Throughput: 0: 233.9. Samples: 68286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:45,145][00194] Avg episode reward: [(0, '4.605')] [2024-09-01 14:55:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 270336. Throughput: 0: 230.7. Samples: 69530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:50,140][00194] Avg episode reward: [(0, '4.674')] [2024-09-01 14:55:55,141][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 274432. Throughput: 0: 227.6. Samples: 70106. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:55:55,149][00194] Avg episode reward: [(0, '4.661')] [2024-09-01 14:56:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 278528. Throughput: 0: 226.2. Samples: 71630. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:00,144][00194] Avg episode reward: [(0, '4.700')] [2024-09-01 14:56:04,897][03021] Saving new best policy, reward=4.700! [2024-09-01 14:56:04,901][03034] Updated weights for policy 0, policy_version 70 (0.1759) [2024-09-01 14:56:05,136][00194] Fps is (10 sec: 1229.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 286720. Throughput: 0: 237.5. Samples: 73224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:05,139][00194] Avg episode reward: [(0, '4.741')] [2024-09-01 14:56:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 286720. Throughput: 0: 236.0. Samples: 73956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:10,139][00194] Avg episode reward: [(0, '4.723')] [2024-09-01 14:56:10,600][03021] Saving new best policy, reward=4.741! [2024-09-01 14:56:15,137][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 290816. Throughput: 0: 224.3. Samples: 75032. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:15,147][00194] Avg episode reward: [(0, '4.737')] [2024-09-01 14:56:19,195][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-09-01 14:56:19,305][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000020_81920.pth [2024-09-01 14:56:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 299008. Throughput: 0: 232.8. Samples: 76464. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:20,143][00194] Avg episode reward: [(0, '4.726')] [2024-09-01 14:56:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 303104. Throughput: 0: 237.9. Samples: 77320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:25,139][00194] Avg episode reward: [(0, '4.575')] [2024-09-01 14:56:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 307200. Throughput: 0: 221.5. Samples: 78252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:30,138][00194] Avg episode reward: [(0, '4.571')] [2024-09-01 14:56:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 311296. Throughput: 0: 230.0. Samples: 79880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:35,145][00194] Avg episode reward: [(0, '4.496')] [2024-09-01 14:56:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 315392. Throughput: 0: 229.3. Samples: 80424. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 14:56:40,143][00194] Avg episode reward: [(0, '4.360')] [2024-09-01 14:56:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 319488. Throughput: 0: 231.6. Samples: 82050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 14:56:45,139][00194] Avg episode reward: [(0, '4.360')] [2024-09-01 14:56:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 323584. Throughput: 0: 222.3. Samples: 83226. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:56:50,144][00194] Avg episode reward: [(0, '4.339')] [2024-09-01 14:56:51,612][03034] Updated weights for policy 0, policy_version 80 (0.1042) [2024-09-01 14:56:55,148][00194] Fps is (10 sec: 1227.4, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 331776. Throughput: 0: 218.5. Samples: 83792. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:56:55,153][00194] Avg episode reward: [(0, '4.330')] [2024-09-01 14:57:00,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 335872. Throughput: 0: 232.3. Samples: 85486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:00,146][00194] Avg episode reward: [(0, '4.269')] [2024-09-01 14:57:05,136][00194] Fps is (10 sec: 820.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 339968. Throughput: 0: 222.7. Samples: 86486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:05,143][00194] Avg episode reward: [(0, '4.248')] [2024-09-01 14:57:10,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 344064. Throughput: 0: 219.2. Samples: 87186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:10,142][00194] Avg episode reward: [(0, '4.262')] [2024-09-01 14:57:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 348160. Throughput: 0: 232.4. Samples: 88712. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:15,138][00194] Avg episode reward: [(0, '4.427')] [2024-09-01 14:57:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 352256. Throughput: 0: 228.6. Samples: 90166. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:20,140][00194] Avg episode reward: [(0, '4.391')] [2024-09-01 14:57:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 356352. Throughput: 0: 225.1. Samples: 90554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:25,139][00194] Avg episode reward: [(0, '4.470')] [2024-09-01 14:57:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 360448. Throughput: 0: 225.0. Samples: 92176. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:30,144][00194] Avg episode reward: [(0, '4.480')] [2024-09-01 14:57:34,731][03034] Updated weights for policy 0, policy_version 90 (0.1892) [2024-09-01 14:57:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 368640. Throughput: 0: 232.0. Samples: 93668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:35,139][00194] Avg episode reward: [(0, '4.562')] [2024-09-01 14:57:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 368640. Throughput: 0: 234.6. Samples: 94344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:40,140][00194] Avg episode reward: [(0, '4.613')] [2024-09-01 14:57:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 376832. Throughput: 0: 225.3. Samples: 95622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:45,143][00194] Avg episode reward: [(0, '4.710')] [2024-09-01 14:57:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 380928. Throughput: 0: 234.0. Samples: 97016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:50,139][00194] Avg episode reward: [(0, '4.616')] [2024-09-01 14:57:55,140][00194] Fps is (10 sec: 818.9, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 385024. Throughput: 0: 234.4. Samples: 97734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:55,144][00194] Avg episode reward: [(0, '4.667')] [2024-09-01 14:58:00,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 389120. Throughput: 0: 224.6. Samples: 98818. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:00,141][00194] Avg episode reward: [(0, '4.595')] [2024-09-01 14:58:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 393216. Throughput: 0: 231.3. Samples: 100574. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:05,142][00194] Avg episode reward: [(0, '4.618')] [2024-09-01 14:58:10,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 397312. Throughput: 0: 234.4. Samples: 101100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:10,145][00194] Avg episode reward: [(0, '4.595')] [2024-09-01 14:58:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 401408. Throughput: 0: 234.8. Samples: 102744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:15,141][00194] Avg episode reward: [(0, '4.580')] [2024-09-01 14:58:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 405504. Throughput: 0: 226.3. Samples: 103850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:58:20,144][00194] Avg episode reward: [(0, '4.550')] [2024-09-01 14:58:20,574][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth... [2024-09-01 14:58:20,580][03034] Updated weights for policy 0, policy_version 100 (0.1151) [2024-09-01 14:58:20,680][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000046_188416.pth [2024-09-01 14:58:22,878][03021] Signal inference workers to stop experience collection... (100 times) [2024-09-01 14:58:22,935][03034] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 14:58:24,391][03021] Signal inference workers to resume experience collection... (100 times) [2024-09-01 14:58:24,392][03034] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 14:58:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 413696. Throughput: 0: 230.1. Samples: 104698. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:25,141][00194] Avg episode reward: [(0, '4.563')] [2024-09-01 14:58:30,141][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 417792. Throughput: 0: 230.9. Samples: 106016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:30,152][00194] Avg episode reward: [(0, '4.566')] [2024-09-01 14:58:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 421888. Throughput: 0: 225.6. Samples: 107168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:35,139][00194] Avg episode reward: [(0, '4.687')] [2024-09-01 14:58:40,136][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 425984. Throughput: 0: 228.2. Samples: 108000. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:40,143][00194] Avg episode reward: [(0, '4.638')] [2024-09-01 14:58:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 430080. Throughput: 0: 237.9. Samples: 109524. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:45,139][00194] Avg episode reward: [(0, '4.693')] [2024-09-01 14:58:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 434176. Throughput: 0: 230.9. Samples: 110966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:50,141][00194] Avg episode reward: [(0, '4.693')] [2024-09-01 14:58:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 438272. Throughput: 0: 227.2. Samples: 111322. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:58:55,143][00194] Avg episode reward: [(0, '4.664')] [2024-09-01 14:59:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 442368. Throughput: 0: 222.4. Samples: 112750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:00,146][00194] Avg episode reward: [(0, '4.659')] [2024-09-01 14:59:04,236][03034] Updated weights for policy 0, policy_version 110 (0.1529) [2024-09-01 14:59:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 450560. Throughput: 0: 232.6. Samples: 114318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:05,140][00194] Avg episode reward: [(0, '4.645')] [2024-09-01 14:59:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 454656. Throughput: 0: 231.7. Samples: 115126. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:10,140][00194] Avg episode reward: [(0, '4.667')] [2024-09-01 14:59:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 458752. Throughput: 0: 227.5. Samples: 116250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:15,140][00194] Avg episode reward: [(0, '4.686')] [2024-09-01 14:59:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 462848. Throughput: 0: 235.5. Samples: 117764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:20,139][00194] Avg episode reward: [(0, '4.675')] [2024-09-01 14:59:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 466944. Throughput: 0: 229.6. Samples: 118330. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:25,146][00194] Avg episode reward: [(0, '4.773')] [2024-09-01 14:59:28,465][03021] Saving new best policy, reward=4.773! [2024-09-01 14:59:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 471040. Throughput: 0: 218.7. Samples: 119364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:30,142][00194] Avg episode reward: [(0, '4.739')] [2024-09-01 14:59:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 475136. Throughput: 0: 220.4. Samples: 120884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:35,143][00194] Avg episode reward: [(0, '4.641')] [2024-09-01 14:59:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 479232. Throughput: 0: 228.1. Samples: 121586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:40,141][00194] Avg episode reward: [(0, '4.693')] [2024-09-01 14:59:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 483328. Throughput: 0: 229.9. Samples: 123094. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:45,139][00194] Avg episode reward: [(0, '4.684')] [2024-09-01 14:59:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 487424. Throughput: 0: 219.0. Samples: 124172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:50,143][00194] Avg episode reward: [(0, '4.749')] [2024-09-01 14:59:51,140][03034] Updated weights for policy 0, policy_version 120 (0.1018) [2024-09-01 14:59:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 495616. Throughput: 0: 218.4. Samples: 124956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:59:55,138][00194] Avg episode reward: [(0, '4.674')] [2024-09-01 15:00:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 499712. Throughput: 0: 224.7. Samples: 126362. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:00:00,142][00194] Avg episode reward: [(0, '4.494')] [2024-09-01 15:00:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 503808. Throughput: 0: 216.9. Samples: 127524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:05,140][00194] Avg episode reward: [(0, '4.494')] [2024-09-01 15:00:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 507904. Throughput: 0: 221.1. Samples: 128280. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:10,139][00194] Avg episode reward: [(0, '4.457')] [2024-09-01 15:00:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 512000. Throughput: 0: 231.3. Samples: 129772. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:15,138][00194] Avg episode reward: [(0, '4.464')] [2024-09-01 15:00:20,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 516096. Throughput: 0: 232.2. Samples: 131334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:20,145][00194] Avg episode reward: [(0, '4.520')] [2024-09-01 15:00:21,930][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000127_520192.pth... [2024-09-01 15:00:22,073][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-09-01 15:00:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 520192. Throughput: 0: 223.6. Samples: 131648. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:25,139][00194] Avg episode reward: [(0, '4.569')] [2024-09-01 15:00:30,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 524288. Throughput: 0: 225.9. Samples: 133258. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:30,141][00194] Avg episode reward: [(0, '4.595')] [2024-09-01 15:00:34,393][03034] Updated weights for policy 0, policy_version 130 (0.0539) [2024-09-01 15:00:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 532480. Throughput: 0: 232.9. Samples: 134654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:35,138][00194] Avg episode reward: [(0, '4.611')] [2024-09-01 15:00:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 536576. Throughput: 0: 233.0. Samples: 135442. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:40,140][00194] Avg episode reward: [(0, '4.575')] [2024-09-01 15:00:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 540672. Throughput: 0: 227.2. Samples: 136588. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:45,144][00194] Avg episode reward: [(0, '4.644')] [2024-09-01 15:00:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 544768. Throughput: 0: 234.0. Samples: 138056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:50,138][00194] Avg episode reward: [(0, '4.662')] [2024-09-01 15:00:55,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 548864. Throughput: 0: 233.1. Samples: 138768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:55,144][00194] Avg episode reward: [(0, '4.767')] [2024-09-01 15:01:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 552960. Throughput: 0: 222.2. Samples: 139772. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:00,142][00194] Avg episode reward: [(0, '4.729')] [2024-09-01 15:01:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 557056. Throughput: 0: 227.7. Samples: 141580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:05,139][00194] Avg episode reward: [(0, '4.637')] [2024-09-01 15:01:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 565248. Throughput: 0: 236.0. Samples: 142268. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:01:10,138][00194] Avg episode reward: [(0, '4.574')] [2024-09-01 15:01:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 565248. Throughput: 0: 233.5. Samples: 143766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:01:15,139][00194] Avg episode reward: [(0, '4.515')] [2024-09-01 15:01:20,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 569344. Throughput: 0: 225.2. Samples: 144786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:20,144][00194] Avg episode reward: [(0, '4.487')] [2024-09-01 15:01:20,351][03034] Updated weights for policy 0, policy_version 140 (0.1177) [2024-09-01 15:01:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 577536. Throughput: 0: 229.0. Samples: 145746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:25,144][00194] Avg episode reward: [(0, '4.569')] [2024-09-01 15:01:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 581632. Throughput: 0: 235.0. Samples: 147164. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:30,143][00194] Avg episode reward: [(0, '4.565')] [2024-09-01 15:01:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 585728. Throughput: 0: 226.4. Samples: 148242. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:35,147][00194] Avg episode reward: [(0, '4.511')] [2024-09-01 15:01:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 589824. Throughput: 0: 225.3. Samples: 148904. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:40,141][00194] Avg episode reward: [(0, '4.510')] [2024-09-01 15:01:45,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 593920. Throughput: 0: 238.1. Samples: 150488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:45,142][00194] Avg episode reward: [(0, '4.465')] [2024-09-01 15:01:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 598016. Throughput: 0: 230.5. Samples: 151954. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:50,142][00194] Avg episode reward: [(0, '4.498')] [2024-09-01 15:01:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 602112. Throughput: 0: 223.6. Samples: 152328. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:55,139][00194] Avg episode reward: [(0, '4.447')] [2024-09-01 15:02:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 610304. Throughput: 0: 227.8. Samples: 154016. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:00,139][00194] Avg episode reward: [(0, '4.431')] [2024-09-01 15:02:03,616][03034] Updated weights for policy 0, policy_version 150 (0.1017) [2024-09-01 15:02:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 614400. Throughput: 0: 238.2. Samples: 155506. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:05,138][00194] Avg episode reward: [(0, '4.470')] [2024-09-01 15:02:07,017][03021] Signal inference workers to stop experience collection... (150 times) [2024-09-01 15:02:07,091][03034] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 15:02:08,870][03021] Signal inference workers to resume experience collection... (150 times) [2024-09-01 15:02:08,878][03034] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 15:02:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 618496. Throughput: 0: 229.7. Samples: 156084. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:10,143][00194] Avg episode reward: [(0, '4.466')] [2024-09-01 15:02:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 622592. Throughput: 0: 220.9. Samples: 157106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:15,138][00194] Avg episode reward: [(0, '4.473')] [2024-09-01 15:02:17,310][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth... [2024-09-01 15:02:17,393][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth [2024-09-01 15:02:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 626688. Throughput: 0: 238.5. Samples: 158976. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:02:20,139][00194] Avg episode reward: [(0, '4.478')] [2024-09-01 15:02:25,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 630784. Throughput: 0: 235.4. Samples: 159500. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:02:25,143][00194] Avg episode reward: [(0, '4.544')] [2024-09-01 15:02:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 634880. Throughput: 0: 223.4. Samples: 160542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:30,144][00194] Avg episode reward: [(0, '4.618')] [2024-09-01 15:02:35,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 638976. Throughput: 0: 229.2. Samples: 162266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:35,138][00194] Avg episode reward: [(0, '4.712')] [2024-09-01 15:02:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 647168. Throughput: 0: 239.9. Samples: 163124. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:02:40,143][00194] Avg episode reward: [(0, '4.754')] [2024-09-01 15:02:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 651264. Throughput: 0: 227.9. Samples: 164272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:02:45,140][00194] Avg episode reward: [(0, '4.803')] [2024-09-01 15:02:49,529][03021] Saving new best policy, reward=4.803! [2024-09-01 15:02:49,534][03034] Updated weights for policy 0, policy_version 160 (0.1163) [2024-09-01 15:02:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 655360. Throughput: 0: 217.7. Samples: 165302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:02:50,138][00194] Avg episode reward: [(0, '4.753')] [2024-09-01 15:02:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 659456. Throughput: 0: 227.1. Samples: 166302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:55,142][00194] Avg episode reward: [(0, '4.763')] [2024-09-01 15:03:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 663552. Throughput: 0: 235.3. Samples: 167694. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:00,145][00194] Avg episode reward: [(0, '4.789')] [2024-09-01 15:03:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 667648. Throughput: 0: 218.0. Samples: 168786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:05,142][00194] Avg episode reward: [(0, '4.809')] [2024-09-01 15:03:07,580][03021] Saving new best policy, reward=4.809! [2024-09-01 15:03:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 671744. Throughput: 0: 219.8. Samples: 169390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:10,146][00194] Avg episode reward: [(0, '4.767')] [2024-09-01 15:03:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 675840. Throughput: 0: 239.8. Samples: 171332. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:15,140][00194] Avg episode reward: [(0, '4.704')] [2024-09-01 15:03:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 679936. Throughput: 0: 227.6. Samples: 172508. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:20,141][00194] Avg episode reward: [(0, '4.710')] [2024-09-01 15:03:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 684032. Throughput: 0: 217.0. Samples: 172890. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:25,145][00194] Avg episode reward: [(0, '4.687')] [2024-09-01 15:03:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 692224. Throughput: 0: 228.2. Samples: 174542. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:03:30,144][00194] Avg episode reward: [(0, '4.635')] [2024-09-01 15:03:33,067][03034] Updated weights for policy 0, policy_version 170 (0.0047) [2024-09-01 15:03:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 696320. Throughput: 0: 237.3. Samples: 175980. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:03:35,138][00194] Avg episode reward: [(0, '4.679')] [2024-09-01 15:03:40,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 696320. Throughput: 0: 224.8. Samples: 176420. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:03:40,147][00194] Avg episode reward: [(0, '4.692')] [2024-09-01 15:03:45,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 700416. Throughput: 0: 204.2. Samples: 176884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:03:45,140][00194] Avg episode reward: [(0, '4.706')] [2024-09-01 15:03:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 704512. Throughput: 0: 215.0. Samples: 178462. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:03:50,138][00194] Avg episode reward: [(0, '4.624')] [2024-09-01 15:03:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 712704. Throughput: 0: 215.8. Samples: 179100. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:55,145][00194] Avg episode reward: [(0, '4.634')] [2024-09-01 15:04:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 712704. Throughput: 0: 204.0. Samples: 180512. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:00,139][00194] Avg episode reward: [(0, '4.586')] [2024-09-01 15:04:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 720896. Throughput: 0: 202.8. Samples: 181632. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:05,141][00194] Avg episode reward: [(0, '4.599')] [2024-09-01 15:04:10,137][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 724992. Throughput: 0: 218.0. Samples: 182700. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:10,142][00194] Avg episode reward: [(0, '4.503')] [2024-09-01 15:04:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 729088. Throughput: 0: 209.5. Samples: 183970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:15,142][00194] Avg episode reward: [(0, '4.625')] [2024-09-01 15:04:18,125][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth... [2024-09-01 15:04:18,227][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000127_520192.pth [2024-09-01 15:04:20,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 733184. Throughput: 0: 201.3. Samples: 185038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:20,139][00194] Avg episode reward: [(0, '4.690')] [2024-09-01 15:04:22,805][03034] Updated weights for policy 0, policy_version 180 (0.1020) [2024-09-01 15:04:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 737280. Throughput: 0: 206.0. Samples: 185692. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:25,139][00194] Avg episode reward: [(0, '4.697')] [2024-09-01 15:04:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 741376. Throughput: 0: 233.2. Samples: 187380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:30,139][00194] Avg episode reward: [(0, '4.765')] [2024-09-01 15:04:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 745472. Throughput: 0: 228.0. Samples: 188724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:35,138][00194] Avg episode reward: [(0, '4.752')] [2024-09-01 15:04:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 749568. Throughput: 0: 221.6. Samples: 189072. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:40,138][00194] Avg episode reward: [(0, '4.782')] [2024-09-01 15:04:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 757760. Throughput: 0: 228.0. Samples: 190772. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:45,138][00194] Avg episode reward: [(0, '4.886')] [2024-09-01 15:04:48,333][03021] Saving new best policy, reward=4.886! [2024-09-01 15:04:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 761856. Throughput: 0: 234.7. Samples: 192194. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:50,142][00194] Avg episode reward: [(0, '4.834')] [2024-09-01 15:04:55,139][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 765952. Throughput: 0: 226.4. Samples: 192890. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:55,144][00194] Avg episode reward: [(0, '4.818')] [2024-09-01 15:05:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 770048. Throughput: 0: 222.3. Samples: 193974. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:05:00,139][00194] Avg episode reward: [(0, '4.824')] [2024-09-01 15:05:05,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 774144. Throughput: 0: 239.6. Samples: 195820. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:05,143][00194] Avg episode reward: [(0, '4.840')] [2024-09-01 15:05:06,480][03034] Updated weights for policy 0, policy_version 190 (0.1028) [2024-09-01 15:05:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 778240. Throughput: 0: 234.8. Samples: 196256. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:10,146][00194] Avg episode reward: [(0, '4.802')] [2024-09-01 15:05:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 782336. Throughput: 0: 222.0. Samples: 197368. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:15,138][00194] Avg episode reward: [(0, '4.795')] [2024-09-01 15:05:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 786432. Throughput: 0: 231.2. Samples: 199130. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:20,138][00194] Avg episode reward: [(0, '4.893')] [2024-09-01 15:05:24,273][03021] Saving new best policy, reward=4.893! [2024-09-01 15:05:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 794624. Throughput: 0: 242.6. Samples: 199990. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:25,138][00194] Avg episode reward: [(0, '4.792')] [2024-09-01 15:05:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 798720. Throughput: 0: 230.7. Samples: 201154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:05:30,145][00194] Avg episode reward: [(0, '4.761')] [2024-09-01 15:05:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 802816. Throughput: 0: 222.8. Samples: 202222. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:05:35,139][00194] Avg episode reward: [(0, '4.781')] [2024-09-01 15:05:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 806912. Throughput: 0: 229.4. Samples: 203212. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:40,138][00194] Avg episode reward: [(0, '4.901')] [2024-09-01 15:05:42,114][03021] Saving new best policy, reward=4.901! [2024-09-01 15:05:45,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 811008. Throughput: 0: 236.3. Samples: 204608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:45,140][00194] Avg episode reward: [(0, '4.914')] [2024-09-01 15:05:48,029][03021] Saving new best policy, reward=4.914! [2024-09-01 15:05:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 815104. Throughput: 0: 218.4. Samples: 205650. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:50,139][00194] Avg episode reward: [(0, '4.971')] [2024-09-01 15:05:52,732][03021] Saving new best policy, reward=4.971! [2024-09-01 15:05:52,737][03034] Updated weights for policy 0, policy_version 200 (0.0549) [2024-09-01 15:05:54,974][03021] Signal inference workers to stop experience collection... (200 times) [2024-09-01 15:05:55,011][03034] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 15:05:55,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 819200. Throughput: 0: 223.1. Samples: 206294. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:55,138][00194] Avg episode reward: [(0, '4.972')] [2024-09-01 15:05:56,461][03021] Signal inference workers to resume experience collection... (200 times) [2024-09-01 15:05:56,462][03034] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 15:06:00,144][00194] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 823296. Throughput: 0: 240.2. Samples: 208180. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:00,150][00194] Avg episode reward: [(0, '4.914')] [2024-09-01 15:06:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 827392. Throughput: 0: 227.9. Samples: 209384. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:05,139][00194] Avg episode reward: [(0, '4.894')] [2024-09-01 15:06:10,136][00194] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 831488. Throughput: 0: 217.1. Samples: 209760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:10,144][00194] Avg episode reward: [(0, '4.865')] [2024-09-01 15:06:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 839680. Throughput: 0: 227.7. Samples: 211400. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:15,138][00194] Avg episode reward: [(0, '4.890')] [2024-09-01 15:06:18,004][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000206_843776.pth... [2024-09-01 15:06:18,115][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth [2024-09-01 15:06:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 843776. Throughput: 0: 237.2. Samples: 212898. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:20,141][00194] Avg episode reward: [(0, '4.935')] [2024-09-01 15:06:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 847872. Throughput: 0: 226.6. Samples: 213408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:25,141][00194] Avg episode reward: [(0, '4.932')] [2024-09-01 15:06:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 851968. Throughput: 0: 220.2. Samples: 214516. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:30,139][00194] Avg episode reward: [(0, '5.019')] [2024-09-01 15:06:32,231][03021] Saving new best policy, reward=5.019! [2024-09-01 15:06:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 856064. Throughput: 0: 241.8. Samples: 216530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:35,139][00194] Avg episode reward: [(0, '4.896')] [2024-09-01 15:06:36,152][03034] Updated weights for policy 0, policy_version 210 (0.1038) [2024-09-01 15:06:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 860160. Throughput: 0: 237.0. Samples: 216960. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:40,138][00194] Avg episode reward: [(0, '5.017')] [2024-09-01 15:06:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 864256. Throughput: 0: 219.9. Samples: 218074. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:45,138][00194] Avg episode reward: [(0, '5.050')] [2024-09-01 15:06:50,137][03021] Saving new best policy, reward=5.050! [2024-09-01 15:06:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 868352. Throughput: 0: 228.8. Samples: 219678. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:50,148][00194] Avg episode reward: [(0, '5.099')] [2024-09-01 15:06:54,115][03021] Saving new best policy, reward=5.099! [2024-09-01 15:06:55,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 876544. Throughput: 0: 241.8. Samples: 220640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:55,143][00194] Avg episode reward: [(0, '5.143')] [2024-09-01 15:06:59,804][03021] Saving new best policy, reward=5.143! [2024-09-01 15:07:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.9, 300 sec: 902.5). Total num frames: 880640. Throughput: 0: 230.2. Samples: 221760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:00,143][00194] Avg episode reward: [(0, '5.173')] [2024-09-01 15:07:04,754][03021] Saving new best policy, reward=5.173! [2024-09-01 15:07:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 884736. Throughput: 0: 221.0. Samples: 222844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:05,139][00194] Avg episode reward: [(0, '5.234')] [2024-09-01 15:07:08,565][03021] Saving new best policy, reward=5.234! [2024-09-01 15:07:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 888832. Throughput: 0: 231.5. Samples: 223826. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:07:10,139][00194] Avg episode reward: [(0, '5.251')] [2024-09-01 15:07:12,406][03021] Saving new best policy, reward=5.251! [2024-09-01 15:07:15,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 892928. Throughput: 0: 236.7. Samples: 225168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:07:15,144][00194] Avg episode reward: [(0, '5.313')] [2024-09-01 15:07:18,300][03021] Saving new best policy, reward=5.313! [2024-09-01 15:07:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 897024. Throughput: 0: 215.1. Samples: 226208. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:20,141][00194] Avg episode reward: [(0, '5.267')] [2024-09-01 15:07:22,874][03034] Updated weights for policy 0, policy_version 220 (0.0056) [2024-09-01 15:07:25,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 901120. Throughput: 0: 221.4. Samples: 226922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:25,138][00194] Avg episode reward: [(0, '5.138')] [2024-09-01 15:07:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 905216. Throughput: 0: 237.6. Samples: 228766. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:30,147][00194] Avg episode reward: [(0, '5.040')] [2024-09-01 15:07:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 909312. Throughput: 0: 227.8. Samples: 229928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:35,138][00194] Avg episode reward: [(0, '4.980')] [2024-09-01 15:07:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 913408. Throughput: 0: 213.1. Samples: 230228. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:07:40,143][00194] Avg episode reward: [(0, '4.943')] [2024-09-01 15:07:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 921600. Throughput: 0: 226.8. Samples: 231964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:45,143][00194] Avg episode reward: [(0, '5.038')] [2024-09-01 15:07:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 925696. Throughput: 0: 232.8. Samples: 233320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:50,142][00194] Avg episode reward: [(0, '5.004')] [2024-09-01 15:07:55,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 929792. Throughput: 0: 225.4. Samples: 233968. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:55,144][00194] Avg episode reward: [(0, '5.034')] [2024-09-01 15:08:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 933888. Throughput: 0: 220.1. Samples: 235070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:00,138][00194] Avg episode reward: [(0, '5.008')] [2024-09-01 15:08:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 937984. Throughput: 0: 240.2. Samples: 237016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:05,139][00194] Avg episode reward: [(0, '5.089')] [2024-09-01 15:08:06,015][03034] Updated weights for policy 0, policy_version 230 (0.0049) [2024-09-01 15:08:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 942080. Throughput: 0: 232.2. Samples: 237372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:10,139][00194] Avg episode reward: [(0, '5.118')] [2024-09-01 15:08:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 946176. Throughput: 0: 216.0. Samples: 238488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:08:15,141][00194] Avg episode reward: [(0, '5.120')] [2024-09-01 15:08:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 950272. Throughput: 0: 227.3. Samples: 240156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:08:20,148][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000233_954368.pth... [2024-09-01 15:08:20,144][00194] Avg episode reward: [(0, '5.153')] [2024-09-01 15:08:20,244][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth [2024-09-01 15:08:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 958464. Throughput: 0: 243.2. Samples: 241170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:25,145][00194] Avg episode reward: [(0, '5.289')] [2024-09-01 15:08:30,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 962560. Throughput: 0: 227.2. Samples: 242190. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:30,141][00194] Avg episode reward: [(0, '5.218')] [2024-09-01 15:08:35,139][00194] Fps is (10 sec: 819.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 966656. Throughput: 0: 223.0. Samples: 243356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:35,141][00194] Avg episode reward: [(0, '5.357')] [2024-09-01 15:08:38,012][03021] Saving new best policy, reward=5.357! [2024-09-01 15:08:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 970752. Throughput: 0: 228.4. Samples: 244246. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:40,138][00194] Avg episode reward: [(0, '5.368')] [2024-09-01 15:08:41,954][03021] Saving new best policy, reward=5.368! [2024-09-01 15:08:45,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 974848. Throughput: 0: 234.8. Samples: 245638. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:45,142][00194] Avg episode reward: [(0, '5.387')] [2024-09-01 15:08:47,650][03021] Saving new best policy, reward=5.387! [2024-09-01 15:08:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 978944. Throughput: 0: 216.0. Samples: 246738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:50,141][00194] Avg episode reward: [(0, '5.273')] [2024-09-01 15:08:51,987][03034] Updated weights for policy 0, policy_version 240 (0.0550) [2024-09-01 15:08:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 983040. Throughput: 0: 224.0. Samples: 247452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:55,139][00194] Avg episode reward: [(0, '5.202')] [2024-09-01 15:09:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 991232. Throughput: 0: 240.3. Samples: 249302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:00,140][00194] Avg episode reward: [(0, '5.264')] [2024-09-01 15:09:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 991232. Throughput: 0: 225.0. Samples: 250282. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:05,139][00194] Avg episode reward: [(0, '5.497')] [2024-09-01 15:09:09,936][03021] Saving new best policy, reward=5.497! [2024-09-01 15:09:10,137][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 999424. Throughput: 0: 215.1. Samples: 250848. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:10,141][00194] Avg episode reward: [(0, '5.322')] [2024-09-01 15:09:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1003520. Throughput: 0: 225.7. Samples: 252344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:15,138][00194] Avg episode reward: [(0, '5.286')] [2024-09-01 15:09:20,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1007616. Throughput: 0: 232.5. Samples: 253816. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:20,141][00194] Avg episode reward: [(0, '5.171')] [2024-09-01 15:09:25,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1011712. Throughput: 0: 225.7. Samples: 254404. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:25,143][00194] Avg episode reward: [(0, '5.128')] [2024-09-01 15:09:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1015808. Throughput: 0: 224.1. Samples: 255722. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:30,140][00194] Avg episode reward: [(0, '5.294')] [2024-09-01 15:09:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1019904. Throughput: 0: 239.0. Samples: 257494. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:09:35,144][00194] Avg episode reward: [(0, '5.326')] [2024-09-01 15:09:35,395][03034] Updated weights for policy 0, policy_version 250 (0.0061) [2024-09-01 15:09:39,394][03021] Signal inference workers to stop experience collection... (250 times) [2024-09-01 15:09:39,524][03034] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 15:09:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1024000. Throughput: 0: 234.9. Samples: 258024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:09:40,141][00194] Avg episode reward: [(0, '5.417')] [2024-09-01 15:09:41,133][03021] Signal inference workers to resume experience collection... (250 times) [2024-09-01 15:09:41,136][03034] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 15:09:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1028096. Throughput: 0: 219.5. Samples: 259178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:09:45,139][00194] Avg episode reward: [(0, '5.498')] [2024-09-01 15:09:49,340][03021] Saving new best policy, reward=5.498! [2024-09-01 15:09:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1036288. Throughput: 0: 230.7. Samples: 260664. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:09:50,139][00194] Avg episode reward: [(0, '5.447')] [2024-09-01 15:09:55,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1040384. Throughput: 0: 238.4. Samples: 261574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:09:55,153][00194] Avg episode reward: [(0, '5.496')] [2024-09-01 15:10:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1044480. Throughput: 0: 227.6. Samples: 262586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:10:00,139][00194] Avg episode reward: [(0, '5.549')] [2024-09-01 15:10:03,716][03021] Saving new best policy, reward=5.549! [2024-09-01 15:10:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1048576. Throughput: 0: 225.6. Samples: 263966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:10:05,144][00194] Avg episode reward: [(0, '5.649')] [2024-09-01 15:10:07,553][03021] Saving new best policy, reward=5.649! [2024-09-01 15:10:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1052672. Throughput: 0: 227.4. Samples: 264638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:10,146][00194] Avg episode reward: [(0, '5.691')] [2024-09-01 15:10:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1056768. Throughput: 0: 228.5. Samples: 266004. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:15,139][00194] Avg episode reward: [(0, '5.710')] [2024-09-01 15:10:17,000][03021] Saving new best policy, reward=5.691! [2024-09-01 15:10:17,103][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth... [2024-09-01 15:10:17,286][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000206_843776.pth [2024-09-01 15:10:17,310][03021] Saving new best policy, reward=5.710! [2024-09-01 15:10:20,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1060864. Throughput: 0: 217.1. Samples: 267262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:20,139][00194] Avg episode reward: [(0, '5.703')] [2024-09-01 15:10:21,943][03034] Updated weights for policy 0, policy_version 260 (0.0722) [2024-09-01 15:10:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1064960. Throughput: 0: 219.8. Samples: 267914. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:10:25,139][00194] Avg episode reward: [(0, '5.700')] [2024-09-01 15:10:30,136][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1073152. Throughput: 0: 234.6. Samples: 269734. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:30,139][00194] Avg episode reward: [(0, '5.906')] [2024-09-01 15:10:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1073152. Throughput: 0: 224.4. Samples: 270760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:35,138][00194] Avg episode reward: [(0, '5.824')] [2024-09-01 15:10:35,459][03021] Saving new best policy, reward=5.906! [2024-09-01 15:10:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1081344. Throughput: 0: 215.5. Samples: 271270. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:40,139][00194] Avg episode reward: [(0, '5.875')] [2024-09-01 15:10:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1085440. Throughput: 0: 227.7. Samples: 272832. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:45,138][00194] Avg episode reward: [(0, '5.695')] [2024-09-01 15:10:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1089536. Throughput: 0: 231.3. Samples: 274376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:50,138][00194] Avg episode reward: [(0, '5.816')] [2024-09-01 15:10:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1093632. Throughput: 0: 227.4. Samples: 274872. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:55,141][00194] Avg episode reward: [(0, '5.754')] [2024-09-01 15:11:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1097728. Throughput: 0: 221.0. Samples: 275948. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:11:00,139][00194] Avg episode reward: [(0, '5.781')] [2024-09-01 15:11:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1101824. Throughput: 0: 238.0. Samples: 277970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:11:05,144][00194] Avg episode reward: [(0, '5.757')] [2024-09-01 15:11:05,615][03034] Updated weights for policy 0, policy_version 270 (0.1032) [2024-09-01 15:11:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1105920. Throughput: 0: 235.9. Samples: 278528. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:10,138][00194] Avg episode reward: [(0, '5.848')] [2024-09-01 15:11:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1110016. Throughput: 0: 218.7. Samples: 279576. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:15,143][00194] Avg episode reward: [(0, '5.596')] [2024-09-01 15:11:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1118208. Throughput: 0: 230.5. Samples: 281134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:11:20,142][00194] Avg episode reward: [(0, '5.533')] [2024-09-01 15:11:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1122304. Throughput: 0: 240.7. Samples: 282100. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:25,138][00194] Avg episode reward: [(0, '5.706')] [2024-09-01 15:11:30,140][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1126400. Throughput: 0: 229.2. Samples: 283146. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:30,152][00194] Avg episode reward: [(0, '5.779')] [2024-09-01 15:11:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1130496. Throughput: 0: 225.8. Samples: 284536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:11:35,139][00194] Avg episode reward: [(0, '5.730')] [2024-09-01 15:11:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1134592. Throughput: 0: 230.4. Samples: 285242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:11:40,144][00194] Avg episode reward: [(0, '5.756')] [2024-09-01 15:11:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1138688. Throughput: 0: 244.0. Samples: 286930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:11:45,138][00194] Avg episode reward: [(0, '5.751')] [2024-09-01 15:11:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1142784. Throughput: 0: 222.8. Samples: 287998. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:11:50,142][00194] Avg episode reward: [(0, '5.934')] [2024-09-01 15:11:51,522][03021] Saving new best policy, reward=5.934! [2024-09-01 15:11:51,531][03034] Updated weights for policy 0, policy_version 280 (0.0527) [2024-09-01 15:11:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1146880. Throughput: 0: 226.8. Samples: 288732. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:11:55,141][00194] Avg episode reward: [(0, '6.167')] [2024-09-01 15:11:59,094][03021] Saving new best policy, reward=6.167! [2024-09-01 15:12:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1155072. Throughput: 0: 238.4. Samples: 290302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:12:00,143][00194] Avg episode reward: [(0, '6.073')] [2024-09-01 15:12:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1159168. Throughput: 0: 228.0. Samples: 291394. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:12:05,141][00194] Avg episode reward: [(0, '6.193')] [2024-09-01 15:12:09,569][03021] Saving new best policy, reward=6.193! [2024-09-01 15:12:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1163264. Throughput: 0: 222.5. Samples: 292112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:12:10,142][00194] Avg episode reward: [(0, '6.283')] [2024-09-01 15:12:13,418][03021] Saving new best policy, reward=6.283! [2024-09-01 15:12:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1167360. Throughput: 0: 230.2. Samples: 293504. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:12:15,143][00194] Avg episode reward: [(0, '6.266')] [2024-09-01 15:12:17,182][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000286_1171456.pth... [2024-09-01 15:12:17,285][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000233_954368.pth [2024-09-01 15:12:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1171456. Throughput: 0: 236.5. Samples: 295178. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:12:20,145][00194] Avg episode reward: [(0, '6.343')] [2024-09-01 15:12:22,818][03021] Saving new best policy, reward=6.343! [2024-09-01 15:12:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1175552. Throughput: 0: 228.8. Samples: 295536. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:12:25,148][00194] Avg episode reward: [(0, '6.317')] [2024-09-01 15:12:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1179648. Throughput: 0: 222.4. Samples: 296940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:30,139][00194] Avg episode reward: [(0, '6.348')] [2024-09-01 15:12:34,929][03021] Saving new best policy, reward=6.348! [2024-09-01 15:12:34,946][03034] Updated weights for policy 0, policy_version 290 (0.0073) [2024-09-01 15:12:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1187840. Throughput: 0: 235.0. Samples: 298572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:35,140][00194] Avg episode reward: [(0, '6.431')] [2024-09-01 15:12:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1187840. Throughput: 0: 234.4. Samples: 299280. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:40,139][00194] Avg episode reward: [(0, '6.319')] [2024-09-01 15:12:40,529][03021] Saving new best policy, reward=6.431! [2024-09-01 15:12:45,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1191936. Throughput: 0: 224.2. Samples: 300392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:45,138][00194] Avg episode reward: [(0, '6.147')] [2024-09-01 15:12:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1200128. Throughput: 0: 229.2. Samples: 301710. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:50,143][00194] Avg episode reward: [(0, '6.167')] [2024-09-01 15:12:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1204224. Throughput: 0: 236.3. Samples: 302744. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:55,152][00194] Avg episode reward: [(0, '6.236')] [2024-09-01 15:13:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1208320. Throughput: 0: 228.6. Samples: 303792. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:13:00,138][00194] Avg episode reward: [(0, '6.283')] [2024-09-01 15:13:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1212416. Throughput: 0: 219.6. Samples: 305060. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:13:05,138][00194] Avg episode reward: [(0, '6.165')] [2024-09-01 15:13:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1216512. Throughput: 0: 228.8. Samples: 305830. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:13:10,140][00194] Avg episode reward: [(0, '6.279')] [2024-09-01 15:13:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1220608. Throughput: 0: 229.5. Samples: 307266. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:13:15,139][00194] Avg episode reward: [(0, '6.328')] [2024-09-01 15:13:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1224704. Throughput: 0: 221.9. Samples: 308558. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:13:20,145][00194] Avg episode reward: [(0, '6.246')] [2024-09-01 15:13:21,318][03034] Updated weights for policy 0, policy_version 300 (0.2803) [2024-09-01 15:13:23,529][03021] Signal inference workers to stop experience collection... (300 times) [2024-09-01 15:13:23,568][03034] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 15:13:24,995][03021] Signal inference workers to resume experience collection... (300 times) [2024-09-01 15:13:24,996][03034] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 15:13:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1232896. Throughput: 0: 224.6. Samples: 309386. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:25,142][00194] Avg episode reward: [(0, '6.128')] [2024-09-01 15:13:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1236992. Throughput: 0: 232.9. Samples: 310872. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:30,141][00194] Avg episode reward: [(0, '6.167')] [2024-09-01 15:13:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1241088. Throughput: 0: 226.8. Samples: 311914. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:35,142][00194] Avg episode reward: [(0, '6.330')] [2024-09-01 15:13:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1245184. Throughput: 0: 218.6. Samples: 312580. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:40,139][00194] Avg episode reward: [(0, '6.510')] [2024-09-01 15:13:42,819][03021] Saving new best policy, reward=6.510! [2024-09-01 15:13:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1249280. Throughput: 0: 231.6. Samples: 314216. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:45,142][00194] Avg episode reward: [(0, '6.641')] [2024-09-01 15:13:50,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1253376. Throughput: 0: 220.0. Samples: 314960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:13:50,145][00194] Avg episode reward: [(0, '6.569')] [2024-09-01 15:13:55,152][00194] Fps is (10 sec: 409.0, 60 sec: 819.0, 300 sec: 888.6). Total num frames: 1253376. Throughput: 0: 210.5. Samples: 315306. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:13:55,161][00194] Avg episode reward: [(0, '6.626')] [2024-09-01 15:13:59,760][03021] Saving new best policy, reward=6.641! [2024-09-01 15:14:00,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1257472. Throughput: 0: 190.1. Samples: 315820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:14:00,142][00194] Avg episode reward: [(0, '6.549')] [2024-09-01 15:14:05,136][00194] Fps is (10 sec: 410.2, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 1257472. Throughput: 0: 184.2. Samples: 316846. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:14:05,142][00194] Avg episode reward: [(0, '6.598')] [2024-09-01 15:14:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1265664. Throughput: 0: 181.2. Samples: 317540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:14:10,141][00194] Avg episode reward: [(0, '6.628')] [2024-09-01 15:14:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1269760. Throughput: 0: 178.1. Samples: 318886. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:15,145][03034] Updated weights for policy 0, policy_version 310 (0.3333) [2024-09-01 15:14:15,146][00194] Avg episode reward: [(0, '6.511')] [2024-09-01 15:14:19,878][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth... [2024-09-01 15:14:19,988][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth [2024-09-01 15:14:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1273856. Throughput: 0: 177.3. Samples: 319894. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:20,144][00194] Avg episode reward: [(0, '6.553')] [2024-09-01 15:14:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1277952. Throughput: 0: 185.9. Samples: 320944. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:25,139][00194] Avg episode reward: [(0, '6.714')] [2024-09-01 15:14:27,506][03021] Saving new best policy, reward=6.714! [2024-09-01 15:14:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1282048. Throughput: 0: 179.1. Samples: 322274. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:30,144][00194] Avg episode reward: [(0, '6.913')] [2024-09-01 15:14:33,142][03021] Saving new best policy, reward=6.913! [2024-09-01 15:14:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1286144. Throughput: 0: 185.7. Samples: 323316. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:35,139][00194] Avg episode reward: [(0, '6.828')] [2024-09-01 15:14:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1290240. Throughput: 0: 193.5. Samples: 324012. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:40,139][00194] Avg episode reward: [(0, '6.677')] [2024-09-01 15:14:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 1294336. Throughput: 0: 218.5. Samples: 325652. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:14:45,145][00194] Avg episode reward: [(0, '6.578')] [2024-09-01 15:14:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 874.7). Total num frames: 1298432. Throughput: 0: 225.8. Samples: 327008. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:14:50,141][00194] Avg episode reward: [(0, '6.610')] [2024-09-01 15:14:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.4, 300 sec: 874.7). Total num frames: 1302528. Throughput: 0: 218.0. Samples: 327352. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:55,140][00194] Avg episode reward: [(0, '6.819')] [2024-09-01 15:14:59,874][03034] Updated weights for policy 0, policy_version 320 (0.0546) [2024-09-01 15:15:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1310720. Throughput: 0: 226.2. Samples: 329064. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:15:00,142][00194] Avg episode reward: [(0, '6.961')] [2024-09-01 15:15:03,678][03021] Saving new best policy, reward=6.961! [2024-09-01 15:15:05,140][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1314816. Throughput: 0: 235.3. Samples: 330482. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:15:05,151][00194] Avg episode reward: [(0, '7.015')] [2024-09-01 15:15:09,277][03021] Saving new best policy, reward=7.015! [2024-09-01 15:15:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1318912. Throughput: 0: 227.0. Samples: 331158. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:10,145][00194] Avg episode reward: [(0, '7.072')] [2024-09-01 15:15:14,145][03021] Saving new best policy, reward=7.072! [2024-09-01 15:15:15,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1323008. Throughput: 0: 221.0. Samples: 332220. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:15,143][00194] Avg episode reward: [(0, '7.007')] [2024-09-01 15:15:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1327104. Throughput: 0: 234.8. Samples: 333882. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:20,143][00194] Avg episode reward: [(0, '6.911')] [2024-09-01 15:15:25,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1331200. Throughput: 0: 234.4. Samples: 334562. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:25,147][00194] Avg episode reward: [(0, '6.986')] [2024-09-01 15:15:30,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1335296. Throughput: 0: 221.2. Samples: 335608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:30,143][00194] Avg episode reward: [(0, '6.933')] [2024-09-01 15:15:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1339392. Throughput: 0: 225.0. Samples: 337132. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:35,139][00194] Avg episode reward: [(0, '7.059')] [2024-09-01 15:15:40,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1347584. Throughput: 0: 237.3. Samples: 338032. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:40,142][00194] Avg episode reward: [(0, '7.376')] [2024-09-01 15:15:45,089][03021] Saving new best policy, reward=7.376! [2024-09-01 15:15:45,099][03034] Updated weights for policy 0, policy_version 330 (0.0529) [2024-09-01 15:15:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1351680. Throughput: 0: 229.2. Samples: 339380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:45,139][00194] Avg episode reward: [(0, '7.517')] [2024-09-01 15:15:49,923][03021] Saving new best policy, reward=7.517! [2024-09-01 15:15:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1355776. Throughput: 0: 220.7. Samples: 340412. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:50,139][00194] Avg episode reward: [(0, '7.502')] [2024-09-01 15:15:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1359872. Throughput: 0: 229.2. Samples: 341472. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:55,139][00194] Avg episode reward: [(0, '7.252')] [2024-09-01 15:16:00,139][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1363968. Throughput: 0: 236.3. Samples: 342856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:16:00,147][00194] Avg episode reward: [(0, '7.138')] [2024-09-01 15:16:05,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1368064. Throughput: 0: 219.8. Samples: 343772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:16:05,144][00194] Avg episode reward: [(0, '7.174')] [2024-09-01 15:16:10,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1372160. Throughput: 0: 224.0. Samples: 344644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:16:10,142][00194] Avg episode reward: [(0, '7.181')] [2024-09-01 15:16:15,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1376256. Throughput: 0: 241.7. Samples: 346482. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:16:15,144][00194] Avg episode reward: [(0, '7.125')] [2024-09-01 15:16:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1380352. Throughput: 0: 233.2. Samples: 347628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:16:20,139][00194] Avg episode reward: [(0, '7.162')] [2024-09-01 15:16:21,157][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000338_1384448.pth... [2024-09-01 15:16:21,298][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000286_1171456.pth [2024-09-01 15:16:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 1384448. Throughput: 0: 224.4. Samples: 348128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:16:25,146][00194] Avg episode reward: [(0, '7.133')] [2024-09-01 15:16:29,921][03034] Updated weights for policy 0, policy_version 340 (0.1170) [2024-09-01 15:16:30,137][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1392640. Throughput: 0: 232.4. Samples: 349840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:16:30,139][00194] Avg episode reward: [(0, '7.302')] [2024-09-01 15:16:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1396736. Throughput: 0: 234.5. Samples: 350966. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:16:35,141][00194] Avg episode reward: [(0, '7.031')] [2024-09-01 15:16:40,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1400832. Throughput: 0: 226.8. Samples: 351678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:16:40,140][00194] Avg episode reward: [(0, '6.916')] [2024-09-01 15:16:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1404928. Throughput: 0: 224.9. Samples: 352976. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:16:45,139][00194] Avg episode reward: [(0, '6.753')] [2024-09-01 15:16:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1409024. Throughput: 0: 243.8. Samples: 354744. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:16:50,154][00194] Avg episode reward: [(0, '6.763')] [2024-09-01 15:16:55,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1413120. Throughput: 0: 233.5. Samples: 355154. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:16:55,147][00194] Avg episode reward: [(0, '6.884')] [2024-09-01 15:17:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1417216. Throughput: 0: 218.9. Samples: 356334. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:17:00,141][00194] Avg episode reward: [(0, '6.823')] [2024-09-01 15:17:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1421312. Throughput: 0: 230.3. Samples: 357992. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:17:05,144][00194] Avg episode reward: [(0, '7.117')] [2024-09-01 15:17:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1429504. Throughput: 0: 236.2. Samples: 358756. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:10,139][00194] Avg episode reward: [(0, '7.097')] [2024-09-01 15:17:14,793][03034] Updated weights for policy 0, policy_version 350 (0.1582) [2024-09-01 15:17:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1433600. Throughput: 0: 225.8. Samples: 360000. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:15,139][00194] Avg episode reward: [(0, '7.085')] [2024-09-01 15:17:18,875][03021] Signal inference workers to stop experience collection... (350 times) [2024-09-01 15:17:18,910][03034] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 15:17:19,857][03021] Signal inference workers to resume experience collection... (350 times) [2024-09-01 15:17:19,859][03034] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 15:17:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1437696. Throughput: 0: 224.4. Samples: 361062. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:20,145][00194] Avg episode reward: [(0, '7.128')] [2024-09-01 15:17:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1441792. Throughput: 0: 229.5. Samples: 362004. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:25,143][00194] Avg episode reward: [(0, '7.602')] [2024-09-01 15:17:27,477][03021] Saving new best policy, reward=7.602! [2024-09-01 15:17:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1445888. Throughput: 0: 236.4. Samples: 363612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:30,138][00194] Avg episode reward: [(0, '7.554')] [2024-09-01 15:17:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1449984. Throughput: 0: 219.3. Samples: 364612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:35,139][00194] Avg episode reward: [(0, '7.587')] [2024-09-01 15:17:40,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1454080. Throughput: 0: 222.6. Samples: 365170. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:40,139][00194] Avg episode reward: [(0, '7.781')] [2024-09-01 15:17:41,646][03021] Saving new best policy, reward=7.781! [2024-09-01 15:17:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1458176. Throughput: 0: 239.3. Samples: 367104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:45,146][00194] Avg episode reward: [(0, '7.998')] [2024-09-01 15:17:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1462272. Throughput: 0: 228.2. Samples: 368260. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:50,139][00194] Avg episode reward: [(0, '8.053')] [2024-09-01 15:17:50,553][03021] Saving new best policy, reward=7.998! [2024-09-01 15:17:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1466368. Throughput: 0: 220.9. Samples: 368696. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:17:55,140][00194] Avg episode reward: [(0, '8.182')] [2024-09-01 15:17:55,831][03021] Saving new best policy, reward=8.053! [2024-09-01 15:17:59,646][03021] Saving new best policy, reward=8.182! [2024-09-01 15:17:59,666][03034] Updated weights for policy 0, policy_version 360 (0.1053) [2024-09-01 15:18:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1474560. Throughput: 0: 230.3. Samples: 370362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:00,139][00194] Avg episode reward: [(0, '8.015')] [2024-09-01 15:18:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1478656. Throughput: 0: 233.6. Samples: 371574. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:05,138][00194] Avg episode reward: [(0, '8.058')] [2024-09-01 15:18:10,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1482752. Throughput: 0: 228.5. Samples: 372288. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:18:10,151][00194] Avg episode reward: [(0, '8.125')] [2024-09-01 15:18:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1486848. Throughput: 0: 216.3. Samples: 373346. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:18:15,140][00194] Avg episode reward: [(0, '7.960')] [2024-09-01 15:18:17,522][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000364_1490944.pth... [2024-09-01 15:18:17,625][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth [2024-09-01 15:18:20,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1490944. Throughput: 0: 236.2. Samples: 375240. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:20,147][00194] Avg episode reward: [(0, '7.430')] [2024-09-01 15:18:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1495040. Throughput: 0: 233.9. Samples: 375696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:25,141][00194] Avg episode reward: [(0, '7.439')] [2024-09-01 15:18:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1499136. Throughput: 0: 218.5. Samples: 376938. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:30,139][00194] Avg episode reward: [(0, '7.559')] [2024-09-01 15:18:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1503232. Throughput: 0: 224.2. Samples: 378348. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:35,147][00194] Avg episode reward: [(0, '7.365')] [2024-09-01 15:18:40,136][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1511424. Throughput: 0: 236.2. Samples: 379326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:40,138][00194] Avg episode reward: [(0, '7.387')] [2024-09-01 15:18:44,260][03034] Updated weights for policy 0, policy_version 370 (0.1036) [2024-09-01 15:18:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1515520. Throughput: 0: 225.3. Samples: 380500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:18:45,140][00194] Avg episode reward: [(0, '7.382')] [2024-09-01 15:18:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.6). Total num frames: 1519616. Throughput: 0: 223.1. Samples: 381612. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:18:50,139][00194] Avg episode reward: [(0, '7.451')] [2024-09-01 15:18:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1523712. Throughput: 0: 226.3. Samples: 382472. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:18:55,139][00194] Avg episode reward: [(0, '7.643')] [2024-09-01 15:19:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1527808. Throughput: 0: 241.7. Samples: 384222. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:19:00,139][00194] Avg episode reward: [(0, '7.557')] [2024-09-01 15:19:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1531904. Throughput: 0: 221.1. Samples: 385188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:19:05,141][00194] Avg episode reward: [(0, '7.454')] [2024-09-01 15:19:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 1536000. Throughput: 0: 223.6. Samples: 385758. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:19:10,147][00194] Avg episode reward: [(0, '7.521')] [2024-09-01 15:19:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1544192. Throughput: 0: 237.3. Samples: 387616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:19:15,143][00194] Avg episode reward: [(0, '7.540')] [2024-09-01 15:19:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1548288. Throughput: 0: 228.8. Samples: 388646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:20,144][00194] Avg episode reward: [(0, '7.705')] [2024-09-01 15:19:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1552384. Throughput: 0: 224.5. Samples: 389428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:25,139][00194] Avg episode reward: [(0, '7.571')] [2024-09-01 15:19:28,646][03034] Updated weights for policy 0, policy_version 380 (0.0036) [2024-09-01 15:19:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1556480. Throughput: 0: 227.3. Samples: 390730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:30,139][00194] Avg episode reward: [(0, '7.616')] [2024-09-01 15:19:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1560576. Throughput: 0: 241.3. Samples: 392472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:35,139][00194] Avg episode reward: [(0, '7.622')] [2024-09-01 15:19:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1564672. Throughput: 0: 228.7. Samples: 392762. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:19:40,146][00194] Avg episode reward: [(0, '7.711')] [2024-09-01 15:19:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1568768. Throughput: 0: 221.4. Samples: 394184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:19:45,144][00194] Avg episode reward: [(0, '7.633')] [2024-09-01 15:19:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1576960. Throughput: 0: 236.8. Samples: 395842. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:50,147][00194] Avg episode reward: [(0, '7.947')] [2024-09-01 15:19:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1576960. Throughput: 0: 238.4. Samples: 396488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:55,143][00194] Avg episode reward: [(0, '8.099')] [2024-09-01 15:20:00,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1581056. Throughput: 0: 222.7. Samples: 397638. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:00,141][00194] Avg episode reward: [(0, '8.349')] [2024-09-01 15:20:04,372][03021] Saving new best policy, reward=8.349! [2024-09-01 15:20:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1589248. Throughput: 0: 229.9. Samples: 398990. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:05,147][00194] Avg episode reward: [(0, '8.654')] [2024-09-01 15:20:08,131][03021] Saving new best policy, reward=8.654! [2024-09-01 15:20:10,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1593344. Throughput: 0: 232.7. Samples: 399902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:10,171][00194] Avg episode reward: [(0, '8.769')] [2024-09-01 15:20:13,538][03021] Saving new best policy, reward=8.769! [2024-09-01 15:20:13,562][03034] Updated weights for policy 0, policy_version 390 (0.0562) [2024-09-01 15:20:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1597440. Throughput: 0: 227.4. Samples: 400964. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:15,141][00194] Avg episode reward: [(0, '8.925')] [2024-09-01 15:20:18,534][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000391_1601536.pth... [2024-09-01 15:20:18,642][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000338_1384448.pth [2024-09-01 15:20:18,660][03021] Saving new best policy, reward=8.925! [2024-09-01 15:20:20,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1601536. Throughput: 0: 220.8. Samples: 402406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:20,139][00194] Avg episode reward: [(0, '8.885')] [2024-09-01 15:20:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1605632. Throughput: 0: 228.4. Samples: 403040. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:25,139][00194] Avg episode reward: [(0, '9.058')] [2024-09-01 15:20:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1609728. Throughput: 0: 235.3. Samples: 404774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:30,139][00194] Avg episode reward: [(0, '9.159')] [2024-09-01 15:20:31,569][03021] Saving new best policy, reward=9.058! [2024-09-01 15:20:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1613824. Throughput: 0: 221.2. Samples: 405798. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:35,144][00194] Avg episode reward: [(0, '9.114')] [2024-09-01 15:20:36,747][03021] Saving new best policy, reward=9.159! [2024-09-01 15:20:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1617920. Throughput: 0: 221.7. Samples: 406466. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:40,139][00194] Avg episode reward: [(0, '9.150')] [2024-09-01 15:20:45,137][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1626112. Throughput: 0: 233.1. Samples: 408126. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:45,141][00194] Avg episode reward: [(0, '9.159')] [2024-09-01 15:20:50,136][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1630208. Throughput: 0: 227.6. Samples: 409234. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:50,143][00194] Avg episode reward: [(0, '9.159')] [2024-09-01 15:20:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1634304. Throughput: 0: 222.7. Samples: 409924. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:55,138][00194] Avg episode reward: [(0, '8.673')] [2024-09-01 15:20:58,503][03034] Updated weights for policy 0, policy_version 400 (0.1013) [2024-09-01 15:21:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1638400. Throughput: 0: 228.9. Samples: 411264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:00,139][00194] Avg episode reward: [(0, '8.707')] [2024-09-01 15:21:00,841][03021] Signal inference workers to stop experience collection... (400 times) [2024-09-01 15:21:00,894][03034] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-09-01 15:21:01,769][03021] Signal inference workers to resume experience collection... (400 times) [2024-09-01 15:21:01,770][03034] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-09-01 15:21:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1642496. Throughput: 0: 236.0. Samples: 413024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:05,144][00194] Avg episode reward: [(0, '8.509')] [2024-09-01 15:21:10,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1646592. Throughput: 0: 229.0. Samples: 413346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:21:10,145][00194] Avg episode reward: [(0, '8.860')] [2024-09-01 15:21:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1650688. Throughput: 0: 225.6. Samples: 414924. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:21:15,145][00194] Avg episode reward: [(0, '8.898')] [2024-09-01 15:21:20,136][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1658880. Throughput: 0: 235.2. Samples: 416384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:21:20,139][00194] Avg episode reward: [(0, '8.862')] [2024-09-01 15:21:25,142][00194] Fps is (10 sec: 1228.0, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 1662976. Throughput: 0: 240.5. Samples: 417292. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:25,145][00194] Avg episode reward: [(0, '8.784')] [2024-09-01 15:21:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1667072. Throughput: 0: 225.9. Samples: 418292. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:30,145][00194] Avg episode reward: [(0, '8.625')] [2024-09-01 15:21:35,136][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1671168. Throughput: 0: 235.0. Samples: 419808. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:35,139][00194] Avg episode reward: [(0, '8.619')] [2024-09-01 15:21:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1675264. Throughput: 0: 235.3. Samples: 420512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:40,139][00194] Avg episode reward: [(0, '8.796')] [2024-09-01 15:21:42,049][03034] Updated weights for policy 0, policy_version 410 (0.0512) [2024-09-01 15:21:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1679360. Throughput: 0: 230.7. Samples: 421646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:45,142][00194] Avg episode reward: [(0, '8.972')] [2024-09-01 15:21:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1683456. Throughput: 0: 228.8. Samples: 423320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:50,147][00194] Avg episode reward: [(0, '8.799')] [2024-09-01 15:21:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1691648. Throughput: 0: 237.5. Samples: 424032. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:21:55,141][00194] Avg episode reward: [(0, '8.738')] [2024-09-01 15:22:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1695744. Throughput: 0: 232.9. Samples: 425404. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:00,142][00194] Avg episode reward: [(0, '8.934')] [2024-09-01 15:22:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1699840. Throughput: 0: 224.1. Samples: 426470. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:05,138][00194] Avg episode reward: [(0, '8.926')] [2024-09-01 15:22:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 1703936. Throughput: 0: 227.1. Samples: 427512. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:10,139][00194] Avg episode reward: [(0, '9.215')] [2024-09-01 15:22:12,456][03021] Saving new best policy, reward=9.215! [2024-09-01 15:22:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1708032. Throughput: 0: 236.5. Samples: 428934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:15,138][00194] Avg episode reward: [(0, '9.419')] [2024-09-01 15:22:17,425][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth... [2024-09-01 15:22:17,549][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000364_1490944.pth [2024-09-01 15:22:17,568][03021] Saving new best policy, reward=9.419! [2024-09-01 15:22:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1712128. Throughput: 0: 225.7. Samples: 429964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:22:20,144][00194] Avg episode reward: [(0, '9.369')] [2024-09-01 15:22:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 1716224. Throughput: 0: 221.6. Samples: 430486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:22:25,139][00194] Avg episode reward: [(0, '9.617')] [2024-09-01 15:22:26,655][03021] Saving new best policy, reward=9.617! [2024-09-01 15:22:26,665][03034] Updated weights for policy 0, policy_version 420 (0.0538) [2024-09-01 15:22:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1720320. Throughput: 0: 238.7. Samples: 432388. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:30,144][00194] Avg episode reward: [(0, '9.930')] [2024-09-01 15:22:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1724416. Throughput: 0: 227.9. Samples: 433574. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:35,138][00194] Avg episode reward: [(0, '9.955')] [2024-09-01 15:22:35,506][03021] Saving new best policy, reward=9.930! [2024-09-01 15:22:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1728512. Throughput: 0: 222.9. Samples: 434062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:22:40,139][00194] Avg episode reward: [(0, '10.560')] [2024-09-01 15:22:40,830][03021] Saving new best policy, reward=9.955! [2024-09-01 15:22:44,663][03021] Saving new best policy, reward=10.560! [2024-09-01 15:22:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1736704. Throughput: 0: 229.2. Samples: 435716. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:45,138][00194] Avg episode reward: [(0, '10.784')] [2024-09-01 15:22:48,436][03021] Saving new best policy, reward=10.784! [2024-09-01 15:22:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1740800. Throughput: 0: 238.2. Samples: 437188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:50,142][00194] Avg episode reward: [(0, '10.897')] [2024-09-01 15:22:53,734][03021] Saving new best policy, reward=10.897! [2024-09-01 15:22:55,140][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1744896. Throughput: 0: 227.2. Samples: 437736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:22:55,143][00194] Avg episode reward: [(0, '10.913')] [2024-09-01 15:22:58,742][03021] Saving new best policy, reward=10.913! [2024-09-01 15:23:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1748992. Throughput: 0: 218.8. Samples: 438778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:23:00,138][00194] Avg episode reward: [(0, '11.191')] [2024-09-01 15:23:02,560][03021] Saving new best policy, reward=11.191! [2024-09-01 15:23:05,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1753088. Throughput: 0: 236.0. Samples: 440584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:05,140][00194] Avg episode reward: [(0, '11.546')] [2024-09-01 15:23:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1757184. Throughput: 0: 238.4. Samples: 441212. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:10,139][00194] Avg episode reward: [(0, '11.671')] [2024-09-01 15:23:11,874][03021] Saving new best policy, reward=11.546! [2024-09-01 15:23:11,903][03034] Updated weights for policy 0, policy_version 430 (0.0095) [2024-09-01 15:23:11,973][03021] Saving new best policy, reward=11.671! [2024-09-01 15:23:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1761280. Throughput: 0: 218.6. Samples: 442226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:23:15,148][00194] Avg episode reward: [(0, '11.796')] [2024-09-01 15:23:16,760][03021] Saving new best policy, reward=11.796! [2024-09-01 15:23:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1765376. Throughput: 0: 229.0. Samples: 443878. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:23:20,142][00194] Avg episode reward: [(0, '11.817')] [2024-09-01 15:23:24,269][03021] Saving new best policy, reward=11.817! [2024-09-01 15:23:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1773568. Throughput: 0: 237.4. Samples: 444746. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:25,145][00194] Avg episode reward: [(0, '11.696')] [2024-09-01 15:23:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1777664. Throughput: 0: 228.0. Samples: 445978. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:30,138][00194] Avg episode reward: [(0, '11.624')] [2024-09-01 15:23:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1781760. Throughput: 0: 218.2. Samples: 447008. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:35,144][00194] Avg episode reward: [(0, '11.782')] [2024-09-01 15:23:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1785856. Throughput: 0: 229.6. Samples: 448068. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:40,141][00194] Avg episode reward: [(0, '11.478')] [2024-09-01 15:23:45,137][00194] Fps is (10 sec: 819.2, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1789952. Throughput: 0: 236.1. Samples: 449402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:45,143][00194] Avg episode reward: [(0, '11.481')] [2024-09-01 15:23:50,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1794048. Throughput: 0: 219.1. Samples: 450444. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:23:50,141][00194] Avg episode reward: [(0, '11.190')] [2024-09-01 15:23:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1798144. Throughput: 0: 220.0. Samples: 451112. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:23:55,140][00194] Avg episode reward: [(0, '11.275')] [2024-09-01 15:23:57,823][03034] Updated weights for policy 0, policy_version 440 (0.1042) [2024-09-01 15:24:00,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1802240. Throughput: 0: 224.4. Samples: 452324. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:24:00,142][00194] Avg episode reward: [(0, '10.995')] [2024-09-01 15:24:05,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1802240. Throughput: 0: 206.5. Samples: 453172. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:24:05,139][00194] Avg episode reward: [(0, '10.939')] [2024-09-01 15:24:10,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1806336. Throughput: 0: 195.7. Samples: 453554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:24:10,142][00194] Avg episode reward: [(0, '10.673')] [2024-09-01 15:24:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1810432. Throughput: 0: 201.9. Samples: 455062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:24:15,138][00194] Avg episode reward: [(0, '10.622')] [2024-09-01 15:24:18,898][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000444_1818624.pth... [2024-09-01 15:24:19,000][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000391_1601536.pth [2024-09-01 15:24:20,136][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1818624. Throughput: 0: 210.0. Samples: 456458. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:24:20,140][00194] Avg episode reward: [(0, '10.791')] [2024-09-01 15:24:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1822720. Throughput: 0: 203.5. Samples: 457226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:25,138][00194] Avg episode reward: [(0, '10.800')] [2024-09-01 15:24:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1826816. Throughput: 0: 196.8. Samples: 458258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:30,139][00194] Avg episode reward: [(0, '10.674')] [2024-09-01 15:24:35,139][00194] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1830912. Throughput: 0: 206.7. Samples: 459744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:35,142][00194] Avg episode reward: [(0, '10.785')] [2024-09-01 15:24:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1835008. Throughput: 0: 205.1. Samples: 460340. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:40,150][00194] Avg episode reward: [(0, '10.915')] [2024-09-01 15:24:45,137][00194] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1839104. Throughput: 0: 206.4. Samples: 461610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:45,144][00194] Avg episode reward: [(0, '10.841')] [2024-09-01 15:24:47,754][03034] Updated weights for policy 0, policy_version 450 (0.1988) [2024-09-01 15:24:50,104][03021] Signal inference workers to stop experience collection... (450 times) [2024-09-01 15:24:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1843200. Throughput: 0: 220.3. Samples: 463084. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:50,139][00194] Avg episode reward: [(0, '10.751')] [2024-09-01 15:24:50,176][03034] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-09-01 15:24:51,090][03021] Signal inference workers to resume experience collection... (450 times) [2024-09-01 15:24:51,092][03034] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-09-01 15:24:55,136][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1851392. Throughput: 0: 227.1. Samples: 463774. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:24:55,138][00194] Avg episode reward: [(0, '10.648')] [2024-09-01 15:25:00,138][00194] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1855488. Throughput: 0: 230.3. Samples: 465428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:00,141][00194] Avg episode reward: [(0, '10.846')] [2024-09-01 15:25:05,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1855488. Throughput: 0: 221.8. Samples: 466438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:05,143][00194] Avg episode reward: [(0, '11.166')] [2024-09-01 15:25:10,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1863680. Throughput: 0: 225.9. Samples: 467392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:10,139][00194] Avg episode reward: [(0, '11.250')] [2024-09-01 15:25:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1867776. Throughput: 0: 233.7. Samples: 468776. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:15,140][00194] Avg episode reward: [(0, '11.464')] [2024-09-01 15:25:20,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1871872. Throughput: 0: 223.7. Samples: 469810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:20,143][00194] Avg episode reward: [(0, '11.591')] [2024-09-01 15:25:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1875968. Throughput: 0: 226.4. Samples: 470528. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:25,138][00194] Avg episode reward: [(0, '11.710')] [2024-09-01 15:25:30,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1880064. Throughput: 0: 232.8. Samples: 472086. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:30,141][00194] Avg episode reward: [(0, '11.390')] [2024-09-01 15:25:31,140][03034] Updated weights for policy 0, policy_version 460 (0.1982) [2024-09-01 15:25:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1884160. Throughput: 0: 229.9. Samples: 473428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:35,139][00194] Avg episode reward: [(0, '11.091')] [2024-09-01 15:25:40,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1888256. Throughput: 0: 225.8. Samples: 473934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:40,144][00194] Avg episode reward: [(0, '11.271')] [2024-09-01 15:25:45,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1896448. Throughput: 0: 226.7. Samples: 475628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:45,144][00194] Avg episode reward: [(0, '11.524')] [2024-09-01 15:25:50,136][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1900544. Throughput: 0: 235.2. Samples: 477024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:50,141][00194] Avg episode reward: [(0, '11.603')] [2024-09-01 15:25:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1904640. Throughput: 0: 229.6. Samples: 477724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:55,143][00194] Avg episode reward: [(0, '11.826')] [2024-09-01 15:25:58,989][03021] Saving new best policy, reward=11.826! [2024-09-01 15:26:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1908736. Throughput: 0: 220.5. Samples: 478700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:00,138][00194] Avg episode reward: [(0, '11.704')] [2024-09-01 15:26:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1912832. Throughput: 0: 236.1. Samples: 480434. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:05,139][00194] Avg episode reward: [(0, '11.518')] [2024-09-01 15:26:10,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1916928. Throughput: 0: 234.1. Samples: 481064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:10,145][00194] Avg episode reward: [(0, '11.552')] [2024-09-01 15:26:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1921024. Throughput: 0: 221.6. Samples: 482058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:15,139][00194] Avg episode reward: [(0, '11.207')] [2024-09-01 15:26:17,400][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000470_1925120.pth... [2024-09-01 15:26:17,404][03034] Updated weights for policy 0, policy_version 470 (0.0541) [2024-09-01 15:26:17,508][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth [2024-09-01 15:26:20,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1925120. Throughput: 0: 230.8. Samples: 483814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:20,146][00194] Avg episode reward: [(0, '11.148')] [2024-09-01 15:26:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1933312. Throughput: 0: 234.7. Samples: 484496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:25,139][00194] Avg episode reward: [(0, '11.199')] [2024-09-01 15:26:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1933312. Throughput: 0: 222.9. Samples: 485660. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:30,138][00194] Avg episode reward: [(0, '11.042')] [2024-09-01 15:26:35,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1937408. Throughput: 0: 222.0. Samples: 487012. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:26:35,139][00194] Avg episode reward: [(0, '11.254')] [2024-09-01 15:26:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1945600. Throughput: 0: 221.3. Samples: 487684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:40,139][00194] Avg episode reward: [(0, '11.548')] [2024-09-01 15:26:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1949696. Throughput: 0: 231.0. Samples: 489094. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:45,138][00194] Avg episode reward: [(0, '11.684')] [2024-09-01 15:26:50,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1953792. Throughput: 0: 215.8. Samples: 490144. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:26:50,146][00194] Avg episode reward: [(0, '11.510')] [2024-09-01 15:26:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1957888. Throughput: 0: 223.5. Samples: 491122. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:26:55,139][00194] Avg episode reward: [(0, '12.261')] [2024-09-01 15:26:57,382][03021] Saving new best policy, reward=12.261! [2024-09-01 15:27:00,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1961984. Throughput: 0: 235.3. Samples: 492648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:00,150][00194] Avg episode reward: [(0, '11.878')] [2024-09-01 15:27:01,620][03034] Updated weights for policy 0, policy_version 480 (0.1911) [2024-09-01 15:27:05,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1966080. Throughput: 0: 224.6. Samples: 493920. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:05,143][00194] Avg episode reward: [(0, '12.115')] [2024-09-01 15:27:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 1970176. Throughput: 0: 216.1. Samples: 494220. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:10,138][00194] Avg episode reward: [(0, '12.094')] [2024-09-01 15:27:15,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1974272. Throughput: 0: 225.1. Samples: 495788. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:15,144][00194] Avg episode reward: [(0, '12.213')] [2024-09-01 15:27:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1982464. Throughput: 0: 214.5. Samples: 496664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:20,139][00194] Avg episode reward: [(0, '12.239')] [2024-09-01 15:27:25,137][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1986560. Throughput: 0: 225.0. Samples: 497810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:25,144][00194] Avg episode reward: [(0, '12.707')] [2024-09-01 15:27:29,334][03021] Saving new best policy, reward=12.707! [2024-09-01 15:27:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1990656. Throughput: 0: 224.4. Samples: 499190. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:30,138][00194] Avg episode reward: [(0, '12.989')] [2024-09-01 15:27:33,268][03021] Saving new best policy, reward=12.989! [2024-09-01 15:27:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1994752. Throughput: 0: 236.7. Samples: 500796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:27:35,142][00194] Avg episode reward: [(0, '13.343')] [2024-09-01 15:27:37,268][03021] Saving new best policy, reward=13.343! [2024-09-01 15:27:40,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1998848. Throughput: 0: 226.3. Samples: 501308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:27:40,142][00194] Avg episode reward: [(0, '13.433')] [2024-09-01 15:27:43,086][03021] Saving new best policy, reward=13.433! [2024-09-01 15:27:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2002944. Throughput: 0: 215.4. Samples: 502342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:27:45,139][00194] Avg episode reward: [(0, '13.881')] [2024-09-01 15:27:47,477][03021] Saving new best policy, reward=13.881! [2024-09-01 15:27:47,484][03034] Updated weights for policy 0, policy_version 490 (0.1174) [2024-09-01 15:27:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2007040. Throughput: 0: 227.7. Samples: 504166. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:27:50,141][00194] Avg episode reward: [(0, '14.077')] [2024-09-01 15:27:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2011136. Throughput: 0: 235.4. Samples: 504812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:27:55,139][00194] Avg episode reward: [(0, '14.071')] [2024-09-01 15:27:55,360][03021] Saving new best policy, reward=14.077! [2024-09-01 15:28:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2015232. Throughput: 0: 229.2. Samples: 506100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:00,139][00194] Avg episode reward: [(0, '14.053')] [2024-09-01 15:28:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2019328. Throughput: 0: 238.3. Samples: 507386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:05,139][00194] Avg episode reward: [(0, '13.822')] [2024-09-01 15:28:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2027520. Throughput: 0: 231.3. Samples: 508216. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:10,138][00194] Avg episode reward: [(0, '13.810')] [2024-09-01 15:28:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2031616. Throughput: 0: 227.8. Samples: 509440. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:15,139][00194] Avg episode reward: [(0, '14.038')] [2024-09-01 15:28:19,560][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth... [2024-09-01 15:28:19,678][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000444_1818624.pth [2024-09-01 15:28:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2035712. Throughput: 0: 214.7. Samples: 510456. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:20,139][00194] Avg episode reward: [(0, '13.959')] [2024-09-01 15:28:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2039808. Throughput: 0: 226.7. Samples: 511510. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:25,139][00194] Avg episode reward: [(0, '14.159')] [2024-09-01 15:28:27,379][03021] Saving new best policy, reward=14.159! [2024-09-01 15:28:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2043904. Throughput: 0: 237.9. Samples: 513046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:30,145][00194] Avg episode reward: [(0, '13.950')] [2024-09-01 15:28:31,832][03034] Updated weights for policy 0, policy_version 500 (0.0071) [2024-09-01 15:28:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2048000. Throughput: 0: 224.8. Samples: 514282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:35,140][00194] Avg episode reward: [(0, '13.903')] [2024-09-01 15:28:35,815][03021] Signal inference workers to stop experience collection... (500 times) [2024-09-01 15:28:35,865][03034] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-09-01 15:28:37,609][03021] Signal inference workers to resume experience collection... (500 times) [2024-09-01 15:28:37,610][03034] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-09-01 15:28:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2052096. Throughput: 0: 217.3. Samples: 514592. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:40,139][00194] Avg episode reward: [(0, '14.125')] [2024-09-01 15:28:45,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2056192. Throughput: 0: 230.9. Samples: 516492. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:45,141][00194] Avg episode reward: [(0, '13.500')] [2024-09-01 15:28:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2064384. Throughput: 0: 228.4. Samples: 517662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:50,143][00194] Avg episode reward: [(0, '13.957')] [2024-09-01 15:28:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2064384. Throughput: 0: 224.3. Samples: 518310. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:55,138][00194] Avg episode reward: [(0, '13.801')] [2024-09-01 15:29:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2072576. Throughput: 0: 227.5. Samples: 519678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:29:00,139][00194] Avg episode reward: [(0, '13.794')] [2024-09-01 15:29:05,136][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2076672. Throughput: 0: 241.3. Samples: 521314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:05,141][00194] Avg episode reward: [(0, '13.400')] [2024-09-01 15:29:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2080768. Throughput: 0: 227.2. Samples: 521736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:10,141][00194] Avg episode reward: [(0, '13.401')] [2024-09-01 15:29:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2084864. Throughput: 0: 215.9. Samples: 522762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:15,147][00194] Avg episode reward: [(0, '13.225')] [2024-09-01 15:29:17,868][03034] Updated weights for policy 0, policy_version 510 (0.1507) [2024-09-01 15:29:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2088960. Throughput: 0: 228.1. Samples: 524548. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:20,145][00194] Avg episode reward: [(0, '12.847')] [2024-09-01 15:29:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2093056. Throughput: 0: 240.8. Samples: 525426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:25,138][00194] Avg episode reward: [(0, '13.575')] [2024-09-01 15:29:30,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2097152. Throughput: 0: 221.0. Samples: 526438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:30,141][00194] Avg episode reward: [(0, '13.734')] [2024-09-01 15:29:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2101248. Throughput: 0: 228.0. Samples: 527920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:35,138][00194] Avg episode reward: [(0, '13.735')] [2024-09-01 15:29:40,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2109440. Throughput: 0: 235.6. Samples: 528910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:40,145][00194] Avg episode reward: [(0, '13.145')] [2024-09-01 15:29:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2113536. Throughput: 0: 227.3. Samples: 529906. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:45,144][00194] Avg episode reward: [(0, '13.358')] [2024-09-01 15:29:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2117632. Throughput: 0: 215.3. Samples: 531004. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:50,148][00194] Avg episode reward: [(0, '13.130')] [2024-09-01 15:29:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2121728. Throughput: 0: 228.3. Samples: 532010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:55,141][00194] Avg episode reward: [(0, '12.993')] [2024-09-01 15:30:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2125824. Throughput: 0: 241.3. Samples: 533622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:00,147][00194] Avg episode reward: [(0, '12.821')] [2024-09-01 15:30:02,181][03034] Updated weights for policy 0, policy_version 520 (0.0530) [2024-09-01 15:30:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2129920. Throughput: 0: 223.8. Samples: 534620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:05,139][00194] Avg episode reward: [(0, '12.767')] [2024-09-01 15:30:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2134016. Throughput: 0: 214.4. Samples: 535076. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:10,140][00194] Avg episode reward: [(0, '12.534')] [2024-09-01 15:30:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2138112. Throughput: 0: 233.9. Samples: 536962. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:15,139][00194] Avg episode reward: [(0, '12.672')] [2024-09-01 15:30:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2142208. Throughput: 0: 225.6. Samples: 538074. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:20,140][00194] Avg episode reward: [(0, '12.575')] [2024-09-01 15:30:20,716][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth... [2024-09-01 15:30:20,864][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000470_1925120.pth [2024-09-01 15:30:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2146304. Throughput: 0: 214.1. Samples: 538544. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:25,147][00194] Avg episode reward: [(0, '12.826')] [2024-09-01 15:30:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 2154496. Throughput: 0: 227.7. Samples: 540154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:30,139][00194] Avg episode reward: [(0, '13.402')] [2024-09-01 15:30:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2158592. Throughput: 0: 236.4. Samples: 541642. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:35,143][00194] Avg episode reward: [(0, '13.861')] [2024-09-01 15:30:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2162688. Throughput: 0: 227.0. Samples: 542226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:40,144][00194] Avg episode reward: [(0, '13.827')] [2024-09-01 15:30:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2166784. Throughput: 0: 214.1. Samples: 543256. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:30:45,138][00194] Avg episode reward: [(0, '14.206')] [2024-09-01 15:30:47,657][03021] Saving new best policy, reward=14.206! [2024-09-01 15:30:47,662][03034] Updated weights for policy 0, policy_version 530 (0.2030) [2024-09-01 15:30:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2170880. Throughput: 0: 232.8. Samples: 545094. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:30:50,145][00194] Avg episode reward: [(0, '14.067')] [2024-09-01 15:30:55,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2174976. Throughput: 0: 235.2. Samples: 545660. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:30:55,148][00194] Avg episode reward: [(0, '14.558')] [2024-09-01 15:31:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2179072. Throughput: 0: 215.9. Samples: 546676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:31:00,139][00194] Avg episode reward: [(0, '14.374')] [2024-09-01 15:31:01,840][03021] Saving new best policy, reward=14.558! [2024-09-01 15:31:05,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2183168. Throughput: 0: 230.2. Samples: 548432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:05,138][00194] Avg episode reward: [(0, '15.056')] [2024-09-01 15:31:09,508][03021] Saving new best policy, reward=15.056! [2024-09-01 15:31:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2191360. Throughput: 0: 236.4. Samples: 549184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:10,144][00194] Avg episode reward: [(0, '15.075')] [2024-09-01 15:31:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2191360. Throughput: 0: 228.0. Samples: 550412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:15,142][00194] Avg episode reward: [(0, '14.909')] [2024-09-01 15:31:15,359][03021] Saving new best policy, reward=15.075! [2024-09-01 15:31:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2199552. Throughput: 0: 218.2. Samples: 551462. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:20,140][00194] Avg episode reward: [(0, '15.050')] [2024-09-01 15:31:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2203648. Throughput: 0: 225.3. Samples: 552364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:25,139][00194] Avg episode reward: [(0, '14.934')] [2024-09-01 15:31:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2207744. Throughput: 0: 229.3. Samples: 553574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:30,138][00194] Avg episode reward: [(0, '15.070')] [2024-09-01 15:31:33,184][03034] Updated weights for policy 0, policy_version 540 (0.1215) [2024-09-01 15:31:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2211840. Throughput: 0: 216.8. Samples: 554852. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:35,139][00194] Avg episode reward: [(0, '15.499')] [2024-09-01 15:31:37,828][03021] Saving new best policy, reward=15.499! [2024-09-01 15:31:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2215936. Throughput: 0: 219.5. Samples: 555536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:40,143][00194] Avg episode reward: [(0, '16.104')] [2024-09-01 15:31:41,799][03021] Saving new best policy, reward=16.104! [2024-09-01 15:31:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2220032. Throughput: 0: 236.4. Samples: 557316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:45,147][00194] Avg episode reward: [(0, '16.763')] [2024-09-01 15:31:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2224128. Throughput: 0: 220.5. Samples: 558356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:50,144][00194] Avg episode reward: [(0, '16.703')] [2024-09-01 15:31:51,298][03021] Saving new best policy, reward=16.763! [2024-09-01 15:31:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2228224. Throughput: 0: 216.8. Samples: 558938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:55,138][00194] Avg episode reward: [(0, '16.208')] [2024-09-01 15:32:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2236416. Throughput: 0: 229.5. Samples: 560740. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:00,138][00194] Avg episode reward: [(0, '16.198')] [2024-09-01 15:32:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2240512. Throughput: 0: 234.5. Samples: 562014. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:05,143][00194] Avg episode reward: [(0, '16.078')] [2024-09-01 15:32:10,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 2244608. Throughput: 0: 230.2. Samples: 562722. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:10,146][00194] Avg episode reward: [(0, '16.188')] [2024-09-01 15:32:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2248704. Throughput: 0: 224.6. Samples: 563680. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:15,143][00194] Avg episode reward: [(0, '16.064')] [2024-09-01 15:32:17,536][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth... [2024-09-01 15:32:17,541][03034] Updated weights for policy 0, policy_version 550 (0.0679) [2024-09-01 15:32:17,645][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth [2024-09-01 15:32:19,875][03021] Signal inference workers to stop experience collection... (550 times) [2024-09-01 15:32:19,925][03034] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-09-01 15:32:20,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2252800. Throughput: 0: 238.3. Samples: 565576. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:20,147][00194] Avg episode reward: [(0, '15.370')] [2024-09-01 15:32:21,340][03021] Signal inference workers to resume experience collection... (550 times) [2024-09-01 15:32:21,342][03034] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-09-01 15:32:25,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2256896. Throughput: 0: 230.9. Samples: 565930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:25,147][00194] Avg episode reward: [(0, '15.204')] [2024-09-01 15:32:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2260992. Throughput: 0: 218.4. Samples: 567144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:30,140][00194] Avg episode reward: [(0, '14.805')] [2024-09-01 15:32:35,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2265088. Throughput: 0: 233.0. Samples: 568840. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:35,141][00194] Avg episode reward: [(0, '14.463')] [2024-09-01 15:32:40,141][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 2273280. Throughput: 0: 235.0. Samples: 569516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:40,143][00194] Avg episode reward: [(0, '14.651')] [2024-09-01 15:32:45,142][00194] Fps is (10 sec: 1228.0, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 2277376. Throughput: 0: 225.2. Samples: 570874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:45,145][00194] Avg episode reward: [(0, '14.633')] [2024-09-01 15:32:50,138][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2281472. Throughput: 0: 219.1. Samples: 571872. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:50,147][00194] Avg episode reward: [(0, '14.797')] [2024-09-01 15:32:55,136][00194] Fps is (10 sec: 819.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2285568. Throughput: 0: 226.4. Samples: 572908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:55,143][00194] Avg episode reward: [(0, '14.419')] [2024-09-01 15:33:00,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2289664. Throughput: 0: 235.5. Samples: 574278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:00,142][00194] Avg episode reward: [(0, '14.241')] [2024-09-01 15:33:03,758][03034] Updated weights for policy 0, policy_version 560 (0.2684) [2024-09-01 15:33:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2293760. Throughput: 0: 213.2. Samples: 575170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:05,143][00194] Avg episode reward: [(0, '13.690')] [2024-09-01 15:33:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2297856. Throughput: 0: 223.6. Samples: 575992. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:10,147][00194] Avg episode reward: [(0, '13.307')] [2024-09-01 15:33:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2301952. Throughput: 0: 240.2. Samples: 577954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:15,145][00194] Avg episode reward: [(0, '13.731')] [2024-09-01 15:33:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2306048. Throughput: 0: 227.3. Samples: 579070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:20,141][00194] Avg episode reward: [(0, '13.223')] [2024-09-01 15:33:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2310144. Throughput: 0: 221.4. Samples: 579478. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:25,144][00194] Avg episode reward: [(0, '13.440')] [2024-09-01 15:33:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2318336. Throughput: 0: 228.3. Samples: 581144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:30,139][00194] Avg episode reward: [(0, '13.476')] [2024-09-01 15:33:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2322432. Throughput: 0: 237.3. Samples: 582552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:33:35,146][00194] Avg episode reward: [(0, '13.612')] [2024-09-01 15:33:40,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2326528. Throughput: 0: 228.6. Samples: 583194. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:33:40,145][00194] Avg episode reward: [(0, '13.687')] [2024-09-01 15:33:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2330624. Throughput: 0: 222.4. Samples: 584286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:45,139][00194] Avg episode reward: [(0, '14.422')] [2024-09-01 15:33:47,104][03034] Updated weights for policy 0, policy_version 570 (0.0054) [2024-09-01 15:33:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2334720. Throughput: 0: 246.4. Samples: 586260. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:50,146][00194] Avg episode reward: [(0, '14.781')] [2024-09-01 15:33:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2338816. Throughput: 0: 237.9. Samples: 586696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:55,138][00194] Avg episode reward: [(0, '14.698')] [2024-09-01 15:34:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2342912. Throughput: 0: 218.3. Samples: 587778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:00,147][00194] Avg episode reward: [(0, '15.246')] [2024-09-01 15:34:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2351104. Throughput: 0: 228.6. Samples: 589356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:05,138][00194] Avg episode reward: [(0, '15.118')] [2024-09-01 15:34:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2351104. Throughput: 0: 235.2. Samples: 590064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:10,139][00194] Avg episode reward: [(0, '14.992')] [2024-09-01 15:34:15,139][00194] Fps is (10 sec: 409.5, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2355200. Throughput: 0: 206.6. Samples: 590442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:15,152][00194] Avg episode reward: [(0, '14.992')] [2024-09-01 15:34:20,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2355200. Throughput: 0: 196.9. Samples: 591414. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:20,138][00194] Avg episode reward: [(0, '14.807')] [2024-09-01 15:34:21,310][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth... [2024-09-01 15:34:21,418][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth [2024-09-01 15:34:25,136][00194] Fps is (10 sec: 409.7, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2359296. Throughput: 0: 194.6. Samples: 591950. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:34:25,146][00194] Avg episode reward: [(0, '14.632')] [2024-09-01 15:34:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2367488. Throughput: 0: 203.6. Samples: 593446. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:34:30,138][00194] Avg episode reward: [(0, '14.973')] [2024-09-01 15:34:35,137][00194] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2371584. Throughput: 0: 185.0. Samples: 594584. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:34:35,143][00194] Avg episode reward: [(0, '14.855')] [2024-09-01 15:34:39,882][03034] Updated weights for policy 0, policy_version 580 (0.0564) [2024-09-01 15:34:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2375680. Throughput: 0: 190.8. Samples: 595282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:40,144][00194] Avg episode reward: [(0, '14.892')] [2024-09-01 15:34:45,136][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2379776. Throughput: 0: 196.6. Samples: 596624. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:45,140][00194] Avg episode reward: [(0, '15.959')] [2024-09-01 15:34:50,139][00194] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 888.6). Total num frames: 2383872. Throughput: 0: 194.1. Samples: 598090. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:50,142][00194] Avg episode reward: [(0, '16.658')] [2024-09-01 15:34:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2387968. Throughput: 0: 189.3. Samples: 598584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:55,138][00194] Avg episode reward: [(0, '16.490')] [2024-09-01 15:35:00,136][00194] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2392064. Throughput: 0: 213.8. Samples: 600064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:35:00,146][00194] Avg episode reward: [(0, '16.349')] [2024-09-01 15:35:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2400256. Throughput: 0: 228.1. Samples: 601678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:05,145][00194] Avg episode reward: [(0, '16.273')] [2024-09-01 15:35:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2400256. Throughput: 0: 232.9. Samples: 602432. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:10,139][00194] Avg episode reward: [(0, '15.851')] [2024-09-01 15:35:15,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2404352. Throughput: 0: 218.2. Samples: 603266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:15,145][00194] Avg episode reward: [(0, '16.457')] [2024-09-01 15:35:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2412544. Throughput: 0: 229.7. Samples: 604922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:20,140][00194] Avg episode reward: [(0, '16.213')] [2024-09-01 15:35:22,868][03034] Updated weights for policy 0, policy_version 590 (0.1754) [2024-09-01 15:35:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2416640. Throughput: 0: 232.8. Samples: 605760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:25,141][00194] Avg episode reward: [(0, '16.573')] [2024-09-01 15:35:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2420736. Throughput: 0: 228.1. Samples: 606888. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:30,139][00194] Avg episode reward: [(0, '16.385')] [2024-09-01 15:35:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2424832. Throughput: 0: 230.5. Samples: 608460. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:35,144][00194] Avg episode reward: [(0, '16.451')] [2024-09-01 15:35:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2428928. Throughput: 0: 235.0. Samples: 609160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:40,145][00194] Avg episode reward: [(0, '16.384')] [2024-09-01 15:35:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2433024. Throughput: 0: 232.8. Samples: 610542. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:45,146][00194] Avg episode reward: [(0, '17.090')] [2024-09-01 15:35:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2437120. Throughput: 0: 224.0. Samples: 611756. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:50,149][00194] Avg episode reward: [(0, '17.433')] [2024-09-01 15:35:51,015][03021] Saving new best policy, reward=17.090! [2024-09-01 15:35:54,856][03021] Saving new best policy, reward=17.433! [2024-09-01 15:35:55,142][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 902.5). Total num frames: 2445312. Throughput: 0: 224.5. Samples: 612538. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:55,151][00194] Avg episode reward: [(0, '17.449')] [2024-09-01 15:35:58,768][03021] Saving new best policy, reward=17.449! [2024-09-01 15:36:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2449408. Throughput: 0: 238.0. Samples: 613974. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:36:00,143][00194] Avg episode reward: [(0, '17.573')] [2024-09-01 15:36:04,424][03021] Saving new best policy, reward=17.573! [2024-09-01 15:36:05,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2453504. Throughput: 0: 225.2. Samples: 615056. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:36:05,138][00194] Avg episode reward: [(0, '17.573')] [2024-09-01 15:36:09,370][03034] Updated weights for policy 0, policy_version 600 (0.0525) [2024-09-01 15:36:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2457600. Throughput: 0: 223.4. Samples: 615814. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:10,140][00194] Avg episode reward: [(0, '17.511')] [2024-09-01 15:36:11,675][03021] Signal inference workers to stop experience collection... (600 times) [2024-09-01 15:36:11,733][03034] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-09-01 15:36:13,146][03021] Signal inference workers to resume experience collection... (600 times) [2024-09-01 15:36:13,149][03034] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-09-01 15:36:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2461696. Throughput: 0: 227.2. Samples: 617114. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:15,146][00194] Avg episode reward: [(0, '17.478')] [2024-09-01 15:36:17,116][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000602_2465792.pth... [2024-09-01 15:36:17,234][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth [2024-09-01 15:36:20,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2465792. Throughput: 0: 226.9. Samples: 618672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:20,141][00194] Avg episode reward: [(0, '17.277')] [2024-09-01 15:36:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2469888. Throughput: 0: 221.2. Samples: 619116. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:25,138][00194] Avg episode reward: [(0, '17.390')] [2024-09-01 15:36:30,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2473984. Throughput: 0: 221.9. Samples: 620528. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:30,152][00194] Avg episode reward: [(0, '16.622')] [2024-09-01 15:36:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2478080. Throughput: 0: 231.1. Samples: 622156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:35,138][00194] Avg episode reward: [(0, '17.234')] [2024-09-01 15:36:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2482176. Throughput: 0: 225.5. Samples: 622684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:40,139][00194] Avg episode reward: [(0, '17.375')] [2024-09-01 15:36:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2486272. Throughput: 0: 216.7. Samples: 623726. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:45,149][00194] Avg episode reward: [(0, '17.829')] [2024-09-01 15:36:49,590][03021] Saving new best policy, reward=17.829! [2024-09-01 15:36:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2494464. Throughput: 0: 227.0. Samples: 625272. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:50,138][00194] Avg episode reward: [(0, '17.889')] [2024-09-01 15:36:53,746][03021] Saving new best policy, reward=17.889! [2024-09-01 15:36:53,783][03034] Updated weights for policy 0, policy_version 610 (0.1219) [2024-09-01 15:36:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2498560. Throughput: 0: 232.4. Samples: 626274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:55,139][00194] Avg episode reward: [(0, '17.831')] [2024-09-01 15:37:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2502656. Throughput: 0: 226.3. Samples: 627296. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:00,139][00194] Avg episode reward: [(0, '18.390')] [2024-09-01 15:37:03,909][03021] Saving new best policy, reward=18.390! [2024-09-01 15:37:05,137][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2506752. Throughput: 0: 220.3. Samples: 628584. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:05,144][00194] Avg episode reward: [(0, '18.006')] [2024-09-01 15:37:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2510848. Throughput: 0: 227.5. Samples: 629352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:10,146][00194] Avg episode reward: [(0, '18.789')] [2024-09-01 15:37:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2514944. Throughput: 0: 226.4. Samples: 630718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:15,145][00194] Avg episode reward: [(0, '18.796')] [2024-09-01 15:37:17,807][03021] Saving new best policy, reward=18.789! [2024-09-01 15:37:17,936][03021] Saving new best policy, reward=18.796! [2024-09-01 15:37:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2519040. Throughput: 0: 216.6. Samples: 631904. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:20,138][00194] Avg episode reward: [(0, '18.491')] [2024-09-01 15:37:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2523136. Throughput: 0: 218.2. Samples: 632502. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:25,138][00194] Avg episode reward: [(0, '18.033')] [2024-09-01 15:37:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2527232. Throughput: 0: 235.9. Samples: 634340. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:30,140][00194] Avg episode reward: [(0, '18.248')] [2024-09-01 15:37:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.8). Total num frames: 2531328. Throughput: 0: 224.7. Samples: 635384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:35,158][00194] Avg episode reward: [(0, '18.016')] [2024-09-01 15:37:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2535424. Throughput: 0: 214.1. Samples: 635908. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:40,139][00194] Avg episode reward: [(0, '18.389')] [2024-09-01 15:37:41,074][03034] Updated weights for policy 0, policy_version 620 (0.1017) [2024-09-01 15:37:45,137][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2543616. Throughput: 0: 229.6. Samples: 637630. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:45,140][00194] Avg episode reward: [(0, '17.935')] [2024-09-01 15:37:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2547712. Throughput: 0: 225.9. Samples: 638748. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:37:50,144][00194] Avg episode reward: [(0, '17.932')] [2024-09-01 15:37:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2551808. Throughput: 0: 223.8. Samples: 639422. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:37:55,140][00194] Avg episode reward: [(0, '17.345')] [2024-09-01 15:38:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2555904. Throughput: 0: 221.8. Samples: 640700. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:00,140][00194] Avg episode reward: [(0, '17.246')] [2024-09-01 15:38:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2560000. Throughput: 0: 237.5. Samples: 642592. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:05,138][00194] Avg episode reward: [(0, '17.003')] [2024-09-01 15:38:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2564096. Throughput: 0: 228.9. Samples: 642804. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:10,139][00194] Avg episode reward: [(0, '16.978')] [2024-09-01 15:38:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2568192. Throughput: 0: 213.9. Samples: 643964. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:15,141][00194] Avg episode reward: [(0, '16.547')] [2024-09-01 15:38:19,957][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth... [2024-09-01 15:38:20,074][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth [2024-09-01 15:38:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2576384. Throughput: 0: 228.1. Samples: 645646. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:20,144][00194] Avg episode reward: [(0, '16.768')] [2024-09-01 15:38:24,865][03034] Updated weights for policy 0, policy_version 630 (0.1022) [2024-09-01 15:38:25,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2580480. Throughput: 0: 237.9. Samples: 646612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:25,146][00194] Avg episode reward: [(0, '16.866')] [2024-09-01 15:38:30,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2580480. Throughput: 0: 221.6. Samples: 647600. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:30,146][00194] Avg episode reward: [(0, '16.842')] [2024-09-01 15:38:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.8, 300 sec: 888.6). Total num frames: 2588672. Throughput: 0: 227.3. Samples: 648976. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:38:35,143][00194] Avg episode reward: [(0, '16.657')] [2024-09-01 15:38:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2592768. Throughput: 0: 229.5. Samples: 649750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:38:40,143][00194] Avg episode reward: [(0, '16.729')] [2024-09-01 15:38:45,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2596864. Throughput: 0: 228.4. Samples: 650980. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:45,150][00194] Avg episode reward: [(0, '16.740')] [2024-09-01 15:38:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2600960. Throughput: 0: 217.1. Samples: 652362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:50,139][00194] Avg episode reward: [(0, '17.368')] [2024-09-01 15:38:55,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2605056. Throughput: 0: 228.0. Samples: 653062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:38:55,143][00194] Avg episode reward: [(0, '17.468')] [2024-09-01 15:39:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2609152. Throughput: 0: 240.8. Samples: 654800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:00,141][00194] Avg episode reward: [(0, '17.162')] [2024-09-01 15:39:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2613248. Throughput: 0: 224.4. Samples: 655744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:05,142][00194] Avg episode reward: [(0, '17.014')] [2024-09-01 15:39:09,782][03034] Updated weights for policy 0, policy_version 640 (0.0048) [2024-09-01 15:39:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2621440. Throughput: 0: 219.4. Samples: 656484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:10,141][00194] Avg episode reward: [(0, '17.488')] [2024-09-01 15:39:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2625536. Throughput: 0: 228.6. Samples: 657888. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:15,141][00194] Avg episode reward: [(0, '17.710')] [2024-09-01 15:39:20,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2625536. Throughput: 0: 220.0. Samples: 658876. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:20,141][00194] Avg episode reward: [(0, '18.127')] [2024-09-01 15:39:25,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2629632. Throughput: 0: 213.5. Samples: 659356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:25,145][00194] Avg episode reward: [(0, '18.380')] [2024-09-01 15:39:30,136][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2637824. Throughput: 0: 220.2. Samples: 660886. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:30,145][00194] Avg episode reward: [(0, '18.273')] [2024-09-01 15:39:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2641920. Throughput: 0: 222.2. Samples: 662362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:35,144][00194] Avg episode reward: [(0, '17.777')] [2024-09-01 15:39:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2646016. Throughput: 0: 219.1. Samples: 662922. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:40,139][00194] Avg episode reward: [(0, '18.493')] [2024-09-01 15:39:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2650112. Throughput: 0: 204.6. Samples: 664006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:45,139][00194] Avg episode reward: [(0, '18.301')] [2024-09-01 15:39:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2654208. Throughput: 0: 228.3. Samples: 666016. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:50,146][00194] Avg episode reward: [(0, '18.586')] [2024-09-01 15:39:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2658304. Throughput: 0: 220.9. Samples: 666426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:55,142][00194] Avg episode reward: [(0, '19.128')] [2024-09-01 15:39:56,763][03021] Saving new best policy, reward=19.128! [2024-09-01 15:39:56,753][03034] Updated weights for policy 0, policy_version 650 (0.1637) [2024-09-01 15:40:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2662400. Throughput: 0: 215.0. Samples: 667564. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:00,138][00194] Avg episode reward: [(0, '19.077')] [2024-09-01 15:40:00,191][03021] Signal inference workers to stop experience collection... (650 times) [2024-09-01 15:40:00,249][03034] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-09-01 15:40:01,517][03021] Signal inference workers to resume experience collection... (650 times) [2024-09-01 15:40:01,519][03034] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-09-01 15:40:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2666496. Throughput: 0: 228.1. Samples: 669140. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:05,138][00194] Avg episode reward: [(0, '18.654')] [2024-09-01 15:40:10,136][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2674688. Throughput: 0: 237.8. Samples: 670056. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:10,143][00194] Avg episode reward: [(0, '18.396')] [2024-09-01 15:40:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2678784. Throughput: 0: 225.3. Samples: 671024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:15,141][00194] Avg episode reward: [(0, '17.993')] [2024-09-01 15:40:19,739][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000655_2682880.pth... [2024-09-01 15:40:19,847][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000602_2465792.pth [2024-09-01 15:40:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2682880. Throughput: 0: 218.4. Samples: 672188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:20,143][00194] Avg episode reward: [(0, '17.373')] [2024-09-01 15:40:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2686976. Throughput: 0: 228.8. Samples: 673218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:25,138][00194] Avg episode reward: [(0, '17.249')] [2024-09-01 15:40:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2691072. Throughput: 0: 235.5. Samples: 674604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:30,145][00194] Avg episode reward: [(0, '17.045')] [2024-09-01 15:40:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2695168. Throughput: 0: 214.0. Samples: 675644. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:35,142][00194] Avg episode reward: [(0, '17.239')] [2024-09-01 15:40:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2699264. Throughput: 0: 219.5. Samples: 676302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:40,142][00194] Avg episode reward: [(0, '17.996')] [2024-09-01 15:40:42,042][03034] Updated weights for policy 0, policy_version 660 (0.2248) [2024-09-01 15:40:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2703360. Throughput: 0: 232.5. Samples: 678028. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:45,145][00194] Avg episode reward: [(0, '17.965')] [2024-09-01 15:40:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2707456. Throughput: 0: 224.2. Samples: 679230. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:50,142][00194] Avg episode reward: [(0, '17.388')] [2024-09-01 15:40:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2711552. Throughput: 0: 215.1. Samples: 679734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:55,141][00194] Avg episode reward: [(0, '17.932')] [2024-09-01 15:41:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2715648. Throughput: 0: 226.7. Samples: 681224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:41:00,148][00194] Avg episode reward: [(0, '17.388')] [2024-09-01 15:41:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2719744. Throughput: 0: 228.8. Samples: 682486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:05,138][00194] Avg episode reward: [(0, '17.630')] [2024-09-01 15:41:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2723840. Throughput: 0: 215.4. Samples: 682910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:10,143][00194] Avg episode reward: [(0, '17.784')] [2024-09-01 15:41:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2727936. Throughput: 0: 211.7. Samples: 684130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:15,145][00194] Avg episode reward: [(0, '18.055')] [2024-09-01 15:41:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2736128. Throughput: 0: 222.1. Samples: 685640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:20,147][00194] Avg episode reward: [(0, '18.308')] [2024-09-01 15:41:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2740224. Throughput: 0: 228.4. Samples: 686580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:25,142][00194] Avg episode reward: [(0, '18.215')] [2024-09-01 15:41:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2744320. Throughput: 0: 212.9. Samples: 687610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:30,146][00194] Avg episode reward: [(0, '17.663')] [2024-09-01 15:41:30,168][03034] Updated weights for policy 0, policy_version 670 (0.2184) [2024-09-01 15:41:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2748416. Throughput: 0: 219.0. Samples: 689086. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:35,144][00194] Avg episode reward: [(0, '17.081')] [2024-09-01 15:41:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2752512. Throughput: 0: 222.7. Samples: 689756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:40,138][00194] Avg episode reward: [(0, '17.206')] [2024-09-01 15:41:45,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2756608. Throughput: 0: 217.5. Samples: 691010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:45,145][00194] Avg episode reward: [(0, '17.206')] [2024-09-01 15:41:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2760704. Throughput: 0: 217.5. Samples: 692274. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:50,146][00194] Avg episode reward: [(0, '17.321')] [2024-09-01 15:41:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2764800. Throughput: 0: 227.1. Samples: 693128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:55,141][00194] Avg episode reward: [(0, '17.597')] [2024-09-01 15:42:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2768896. Throughput: 0: 235.4. Samples: 694724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:00,144][00194] Avg episode reward: [(0, '17.215')] [2024-09-01 15:42:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2772992. Throughput: 0: 224.6. Samples: 695746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:05,138][00194] Avg episode reward: [(0, '17.520')] [2024-09-01 15:42:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2781184. Throughput: 0: 217.6. Samples: 696372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:42:10,142][00194] Avg episode reward: [(0, '17.605')] [2024-09-01 15:42:13,787][03034] Updated weights for policy 0, policy_version 680 (0.0583) [2024-09-01 15:42:15,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2785280. Throughput: 0: 225.9. Samples: 697778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:42:15,141][00194] Avg episode reward: [(0, '16.796')] [2024-09-01 15:42:19,309][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000681_2789376.pth... [2024-09-01 15:42:19,422][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth [2024-09-01 15:42:20,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2789376. Throughput: 0: 217.9. Samples: 698890. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:42:20,141][00194] Avg episode reward: [(0, '16.508')] [2024-09-01 15:42:25,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2793472. Throughput: 0: 219.1. Samples: 699614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:25,144][00194] Avg episode reward: [(0, '17.590')] [2024-09-01 15:42:30,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2797568. Throughput: 0: 222.2. Samples: 701008. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:30,143][00194] Avg episode reward: [(0, '16.690')] [2024-09-01 15:42:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2801664. Throughput: 0: 230.4. Samples: 702642. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:35,139][00194] Avg episode reward: [(0, '16.547')] [2024-09-01 15:42:40,140][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2805760. Throughput: 0: 217.7. Samples: 702926. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:40,146][00194] Avg episode reward: [(0, '16.577')] [2024-09-01 15:42:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2809856. Throughput: 0: 216.4. Samples: 704460. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:45,138][00194] Avg episode reward: [(0, '16.622')] [2024-09-01 15:42:50,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2813952. Throughput: 0: 231.8. Samples: 706178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:50,147][00194] Avg episode reward: [(0, '16.403')] [2024-09-01 15:42:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2818048. Throughput: 0: 228.3. Samples: 706644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:55,139][00194] Avg episode reward: [(0, '16.105')] [2024-09-01 15:43:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2822144. Throughput: 0: 222.1. Samples: 707770. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:43:00,144][00194] Avg episode reward: [(0, '16.257')] [2024-09-01 15:43:01,308][03034] Updated weights for policy 0, policy_version 690 (0.0534) [2024-09-01 15:43:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2830336. Throughput: 0: 231.7. Samples: 709314. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:43:05,139][00194] Avg episode reward: [(0, '16.304')] [2024-09-01 15:43:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2834432. Throughput: 0: 238.5. Samples: 710348. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:43:10,143][00194] Avg episode reward: [(0, '16.727')] [2024-09-01 15:43:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2838528. Throughput: 0: 231.1. Samples: 711408. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:43:15,140][00194] Avg episode reward: [(0, '17.210')] [2024-09-01 15:43:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2842624. Throughput: 0: 219.7. Samples: 712528. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:20,138][00194] Avg episode reward: [(0, '18.390')] [2024-09-01 15:43:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2846720. Throughput: 0: 233.8. Samples: 713446. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:25,139][00194] Avg episode reward: [(0, '18.378')] [2024-09-01 15:43:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2850816. Throughput: 0: 234.7. Samples: 715020. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:30,145][00194] Avg episode reward: [(0, '18.375')] [2024-09-01 15:43:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2854912. Throughput: 0: 220.0. Samples: 716076. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:35,139][00194] Avg episode reward: [(0, '18.811')] [2024-09-01 15:43:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2859008. Throughput: 0: 222.2. Samples: 716644. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:40,147][00194] Avg episode reward: [(0, '18.913')] [2024-09-01 15:43:44,829][03034] Updated weights for policy 0, policy_version 700 (0.2145) [2024-09-01 15:43:45,138][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2867200. Throughput: 0: 240.0. Samples: 718572. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:45,146][00194] Avg episode reward: [(0, '19.036')] [2024-09-01 15:43:48,737][03021] Signal inference workers to stop experience collection... (700 times) [2024-09-01 15:43:48,830][03034] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-09-01 15:43:49,947][03021] Signal inference workers to resume experience collection... (700 times) [2024-09-01 15:43:49,948][03034] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-09-01 15:43:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2871296. Throughput: 0: 228.0. Samples: 719576. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:50,139][00194] Avg episode reward: [(0, '19.069')] [2024-09-01 15:43:55,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2875392. Throughput: 0: 218.9. Samples: 720200. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:55,139][00194] Avg episode reward: [(0, '19.063')] [2024-09-01 15:44:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2879488. Throughput: 0: 226.7. Samples: 721610. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:00,139][00194] Avg episode reward: [(0, '20.121')] [2024-09-01 15:44:02,307][03021] Saving new best policy, reward=20.121! [2024-09-01 15:44:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2883584. Throughput: 0: 239.2. Samples: 723294. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:05,142][00194] Avg episode reward: [(0, '20.386')] [2024-09-01 15:44:07,824][03021] Saving new best policy, reward=20.386! [2024-09-01 15:44:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2887680. Throughput: 0: 225.6. Samples: 723600. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:10,144][00194] Avg episode reward: [(0, '20.153')] [2024-09-01 15:44:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2891776. Throughput: 0: 217.6. Samples: 724810. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:15,139][00194] Avg episode reward: [(0, '20.230')] [2024-09-01 15:44:16,410][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth... [2024-09-01 15:44:16,520][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000655_2682880.pth [2024-09-01 15:44:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2895872. Throughput: 0: 236.0. Samples: 726698. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:20,140][00194] Avg episode reward: [(0, '20.219')] [2024-09-01 15:44:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2899968. Throughput: 0: 232.0. Samples: 727084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:44:25,142][00194] Avg episode reward: [(0, '19.611')] [2024-09-01 15:44:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2904064. Throughput: 0: 217.2. Samples: 728346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:44:30,149][00194] Avg episode reward: [(0, '20.497')] [2024-09-01 15:44:30,732][03034] Updated weights for policy 0, policy_version 710 (0.1086) [2024-09-01 15:44:34,562][03021] Saving new best policy, reward=20.497! [2024-09-01 15:44:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2912256. Throughput: 0: 226.9. Samples: 729788. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:35,143][00194] Avg episode reward: [(0, '20.699')] [2024-09-01 15:44:38,448][03021] Saving new best policy, reward=20.699! [2024-09-01 15:44:40,140][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2916352. Throughput: 0: 235.6. Samples: 730804. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:40,146][00194] Avg episode reward: [(0, '20.725')] [2024-09-01 15:44:44,411][03021] Saving new best policy, reward=20.725! [2024-09-01 15:44:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2920448. Throughput: 0: 228.4. Samples: 731888. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:45,141][00194] Avg episode reward: [(0, '20.492')] [2024-09-01 15:44:50,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2924544. Throughput: 0: 219.4. Samples: 733168. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:50,139][00194] Avg episode reward: [(0, '20.748')] [2024-09-01 15:44:52,926][03021] Saving new best policy, reward=20.748! [2024-09-01 15:44:55,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2928640. Throughput: 0: 229.1. Samples: 733912. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:55,150][00194] Avg episode reward: [(0, '21.064')] [2024-09-01 15:44:57,026][03021] Saving new best policy, reward=21.064! [2024-09-01 15:45:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2932736. Throughput: 0: 233.4. Samples: 735312. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:00,142][00194] Avg episode reward: [(0, '20.611')] [2024-09-01 15:45:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2936832. Throughput: 0: 216.8. Samples: 736456. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:05,140][00194] Avg episode reward: [(0, '20.114')] [2024-09-01 15:45:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2940928. Throughput: 0: 223.9. Samples: 737160. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:10,141][00194] Avg episode reward: [(0, '19.898')] [2024-09-01 15:45:14,918][03034] Updated weights for policy 0, policy_version 720 (0.0056) [2024-09-01 15:45:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2949120. Throughput: 0: 237.5. Samples: 739032. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:15,139][00194] Avg episode reward: [(0, '20.473')] [2024-09-01 15:45:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2949120. Throughput: 0: 228.0. Samples: 740048. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:20,139][00194] Avg episode reward: [(0, '20.979')] [2024-09-01 15:45:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2957312. Throughput: 0: 217.4. Samples: 740584. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:25,140][00194] Avg episode reward: [(0, '20.871')] [2024-09-01 15:45:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2961408. Throughput: 0: 226.6. Samples: 742084. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:30,138][00194] Avg episode reward: [(0, '21.252')] [2024-09-01 15:45:32,582][03021] Saving new best policy, reward=21.252! [2024-09-01 15:45:35,139][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2965504. Throughput: 0: 230.3. Samples: 743532. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:35,143][00194] Avg episode reward: [(0, '21.427')] [2024-09-01 15:45:38,651][03021] Saving new best policy, reward=21.427! [2024-09-01 15:45:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2969600. Throughput: 0: 225.8. Samples: 744070. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:40,138][00194] Avg episode reward: [(0, '21.569')] [2024-09-01 15:45:42,848][03021] Saving new best policy, reward=21.569! [2024-09-01 15:45:45,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2973696. Throughput: 0: 223.2. Samples: 745354. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:45:45,144][00194] Avg episode reward: [(0, '21.570')] [2024-09-01 15:45:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2977792. Throughput: 0: 236.5. Samples: 747098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:45:50,142][00194] Avg episode reward: [(0, '21.226')] [2024-09-01 15:45:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2981888. Throughput: 0: 230.2. Samples: 747520. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:45:55,142][00194] Avg episode reward: [(0, '21.167')] [2024-09-01 15:46:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2985984. Throughput: 0: 214.4. Samples: 748678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:00,144][00194] Avg episode reward: [(0, '21.265')] [2024-09-01 15:46:00,792][03034] Updated weights for policy 0, policy_version 730 (0.0059) [2024-09-01 15:46:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2994176. Throughput: 0: 225.2. Samples: 750184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:46:05,148][00194] Avg episode reward: [(0, '21.036')] [2024-09-01 15:46:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2998272. Throughput: 0: 235.3. Samples: 751172. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:10,143][00194] Avg episode reward: [(0, '21.323')] [2024-09-01 15:46:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3002368. Throughput: 0: 223.9. Samples: 752158. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:15,138][00194] Avg episode reward: [(0, '22.063')] [2024-09-01 15:46:18,718][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth... [2024-09-01 15:46:18,823][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000681_2789376.pth [2024-09-01 15:46:18,834][03021] Saving new best policy, reward=22.063! [2024-09-01 15:46:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 3006464. Throughput: 0: 224.1. Samples: 753616. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:20,145][00194] Avg episode reward: [(0, '21.768')] [2024-09-01 15:46:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3010560. Throughput: 0: 226.8. Samples: 754274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:25,142][00194] Avg episode reward: [(0, '21.474')] [2024-09-01 15:46:30,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3014656. Throughput: 0: 231.1. Samples: 755754. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:30,143][00194] Avg episode reward: [(0, '21.770')] [2024-09-01 15:46:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3018752. Throughput: 0: 218.1. Samples: 756912. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:35,138][00194] Avg episode reward: [(0, '21.529')] [2024-09-01 15:46:40,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3022848. Throughput: 0: 225.1. Samples: 757648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:40,147][00194] Avg episode reward: [(0, '22.132')] [2024-09-01 15:46:44,223][03021] Saving new best policy, reward=22.132! [2024-09-01 15:46:44,235][03034] Updated weights for policy 0, policy_version 740 (0.1655) [2024-09-01 15:46:45,144][00194] Fps is (10 sec: 1227.7, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 3031040. Throughput: 0: 238.3. Samples: 759402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:46:45,149][00194] Avg episode reward: [(0, '21.245')] [2024-09-01 15:46:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3035136. Throughput: 0: 226.7. Samples: 760386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:46:50,138][00194] Avg episode reward: [(0, '21.317')] [2024-09-01 15:46:55,136][00194] Fps is (10 sec: 819.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3039232. Throughput: 0: 215.9. Samples: 760886. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:46:55,139][00194] Avg episode reward: [(0, '21.480')] [2024-09-01 15:47:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3043328. Throughput: 0: 229.9. Samples: 762502. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:00,148][00194] Avg episode reward: [(0, '20.783')] [2024-09-01 15:47:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3047424. Throughput: 0: 229.4. Samples: 763938. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:05,140][00194] Avg episode reward: [(0, '20.889')] [2024-09-01 15:47:10,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3051520. Throughput: 0: 224.8. Samples: 764392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:10,145][00194] Avg episode reward: [(0, '20.672')] [2024-09-01 15:47:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3055616. Throughput: 0: 226.2. Samples: 765932. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:15,144][00194] Avg episode reward: [(0, '19.423')] [2024-09-01 15:47:20,136][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3063808. Throughput: 0: 234.4. Samples: 767462. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:20,138][00194] Avg episode reward: [(0, '19.466')] [2024-09-01 15:47:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3063808. Throughput: 0: 233.9. Samples: 768174. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:25,138][00194] Avg episode reward: [(0, '19.382')] [2024-09-01 15:47:30,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3067904. Throughput: 0: 218.4. Samples: 769230. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:30,139][00194] Avg episode reward: [(0, '19.509')] [2024-09-01 15:47:30,957][03034] Updated weights for policy 0, policy_version 750 (0.2263) [2024-09-01 15:47:33,253][03021] Signal inference workers to stop experience collection... (750 times) [2024-09-01 15:47:33,307][03034] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-09-01 15:47:34,199][03021] Signal inference workers to resume experience collection... (750 times) [2024-09-01 15:47:34,201][03034] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-09-01 15:47:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3076096. Throughput: 0: 226.4. Samples: 770576. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:47:35,138][00194] Avg episode reward: [(0, '19.567')] [2024-09-01 15:47:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3080192. Throughput: 0: 236.5. Samples: 771530. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:47:40,146][00194] Avg episode reward: [(0, '19.152')] [2024-09-01 15:47:45,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 3084288. Throughput: 0: 225.1. Samples: 772634. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:47:45,140][00194] Avg episode reward: [(0, '19.631')] [2024-09-01 15:47:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3088384. Throughput: 0: 223.1. Samples: 773978. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:50,143][00194] Avg episode reward: [(0, '19.936')] [2024-09-01 15:47:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3092480. Throughput: 0: 231.4. Samples: 774804. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:55,147][00194] Avg episode reward: [(0, '19.917')] [2024-09-01 15:48:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3096576. Throughput: 0: 232.5. Samples: 776396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:00,138][00194] Avg episode reward: [(0, '19.718')] [2024-09-01 15:48:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3100672. Throughput: 0: 222.0. Samples: 777450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:05,139][00194] Avg episode reward: [(0, '19.588')] [2024-09-01 15:48:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3108864. Throughput: 0: 223.5. Samples: 778230. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:10,139][00194] Avg episode reward: [(0, '19.439')] [2024-09-01 15:48:13,726][03034] Updated weights for policy 0, policy_version 760 (0.1012) [2024-09-01 15:48:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3112960. Throughput: 0: 233.4. Samples: 779732. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:15,138][00194] Avg episode reward: [(0, '19.651')] [2024-09-01 15:48:18,866][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000761_3117056.pth... [2024-09-01 15:48:18,957][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth [2024-09-01 15:48:20,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3117056. Throughput: 0: 228.6. Samples: 780862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:20,148][00194] Avg episode reward: [(0, '19.440')] [2024-09-01 15:48:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3121152. Throughput: 0: 223.7. Samples: 781596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:25,139][00194] Avg episode reward: [(0, '19.644')] [2024-09-01 15:48:30,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3125248. Throughput: 0: 232.1. Samples: 783078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:30,138][00194] Avg episode reward: [(0, '20.063')] [2024-09-01 15:48:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3129344. Throughput: 0: 232.0. Samples: 784416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:35,145][00194] Avg episode reward: [(0, '20.287')] [2024-09-01 15:48:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3133440. Throughput: 0: 226.0. Samples: 784972. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:40,140][00194] Avg episode reward: [(0, '19.418')] [2024-09-01 15:48:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3137536. Throughput: 0: 224.2. Samples: 786486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:45,142][00194] Avg episode reward: [(0, '19.444')] [2024-09-01 15:48:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3145728. Throughput: 0: 232.2. Samples: 787900. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:50,140][00194] Avg episode reward: [(0, '19.318')] [2024-09-01 15:48:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3145728. Throughput: 0: 229.4. Samples: 788552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:55,140][00194] Avg episode reward: [(0, '19.341')] [2024-09-01 15:49:00,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3149824. Throughput: 0: 222.4. Samples: 789742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:49:00,148][00194] Avg episode reward: [(0, '19.888')] [2024-09-01 15:49:00,843][03034] Updated weights for policy 0, policy_version 770 (0.2036) [2024-09-01 15:49:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3158016. Throughput: 0: 225.8. Samples: 791024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:05,138][00194] Avg episode reward: [(0, '20.481')] [2024-09-01 15:49:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3162112. Throughput: 0: 230.9. Samples: 791988. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:10,139][00194] Avg episode reward: [(0, '21.718')] [2024-09-01 15:49:15,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3166208. Throughput: 0: 222.5. Samples: 793090. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:15,144][00194] Avg episode reward: [(0, '22.134')] [2024-09-01 15:49:18,338][03021] Saving new best policy, reward=22.134! [2024-09-01 15:49:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3170304. Throughput: 0: 225.4. Samples: 794558. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:20,139][00194] Avg episode reward: [(0, '21.760')] [2024-09-01 15:49:25,137][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3174400. Throughput: 0: 228.4. Samples: 795248. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:25,139][00194] Avg episode reward: [(0, '22.142')] [2024-09-01 15:49:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3178496. Throughput: 0: 229.4. Samples: 796810. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:30,142][00194] Avg episode reward: [(0, '22.131')] [2024-09-01 15:49:31,648][03021] Saving new best policy, reward=22.142! [2024-09-01 15:49:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3182592. Throughput: 0: 221.9. Samples: 797884. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:35,139][00194] Avg episode reward: [(0, '22.599')] [2024-09-01 15:49:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3186688. Throughput: 0: 224.4. Samples: 798648. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:49:40,147][00194] Avg episode reward: [(0, '22.329')] [2024-09-01 15:49:40,183][03021] Saving new best policy, reward=22.599! [2024-09-01 15:49:44,449][03034] Updated weights for policy 0, policy_version 780 (0.1073) [2024-09-01 15:49:45,138][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3194880. Throughput: 0: 231.7. Samples: 800170. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:45,141][00194] Avg episode reward: [(0, '22.209')] [2024-09-01 15:49:50,136][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3198976. Throughput: 0: 224.9. Samples: 801146. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:50,147][00194] Avg episode reward: [(0, '22.078')] [2024-09-01 15:49:55,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3203072. Throughput: 0: 219.7. Samples: 801876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:55,138][00194] Avg episode reward: [(0, '21.512')] [2024-09-01 15:50:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3207168. Throughput: 0: 226.1. Samples: 803262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:00,146][00194] Avg episode reward: [(0, '21.933')] [2024-09-01 15:50:05,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3211264. Throughput: 0: 227.3. Samples: 804786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:05,144][00194] Avg episode reward: [(0, '22.340')] [2024-09-01 15:50:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3215360. Throughput: 0: 221.6. Samples: 805222. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:10,140][00194] Avg episode reward: [(0, '22.113')] [2024-09-01 15:50:15,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3219456. Throughput: 0: 214.0. Samples: 806438. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:15,144][00194] Avg episode reward: [(0, '22.157')] [2024-09-01 15:50:16,523][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000787_3223552.pth... [2024-09-01 15:50:16,635][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth [2024-09-01 15:50:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3223552. Throughput: 0: 232.2. Samples: 808334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:20,145][00194] Avg episode reward: [(0, '22.172')] [2024-09-01 15:50:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3227648. Throughput: 0: 223.5. Samples: 808706. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:25,143][00194] Avg episode reward: [(0, '22.708')] [2024-09-01 15:50:26,192][03021] Saving new best policy, reward=22.708! [2024-09-01 15:50:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3231744. Throughput: 0: 217.7. Samples: 809964. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:30,148][00194] Avg episode reward: [(0, '22.729')] [2024-09-01 15:50:31,092][03034] Updated weights for policy 0, policy_version 790 (0.1658) [2024-09-01 15:50:34,963][03021] Saving new best policy, reward=22.729! [2024-09-01 15:50:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3239936. Throughput: 0: 230.4. Samples: 811516. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:35,140][00194] Avg episode reward: [(0, '22.713')] [2024-09-01 15:50:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3244032. Throughput: 0: 236.8. Samples: 812534. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:40,142][00194] Avg episode reward: [(0, '22.768')] [2024-09-01 15:50:44,838][03021] Saving new best policy, reward=22.768! [2024-09-01 15:50:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3248128. Throughput: 0: 227.6. Samples: 813502. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:45,142][00194] Avg episode reward: [(0, '22.931')] [2024-09-01 15:50:49,345][03021] Saving new best policy, reward=22.931! [2024-09-01 15:50:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3252224. Throughput: 0: 219.9. Samples: 814680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:50:50,145][00194] Avg episode reward: [(0, '22.894')] [2024-09-01 15:50:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3256320. Throughput: 0: 230.9. Samples: 815614. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:50:55,138][00194] Avg episode reward: [(0, '22.140')] [2024-09-01 15:51:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3260416. Throughput: 0: 233.3. Samples: 816938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:00,144][00194] Avg episode reward: [(0, '21.306')] [2024-09-01 15:51:05,140][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3264512. Throughput: 0: 215.8. Samples: 818048. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:05,142][00194] Avg episode reward: [(0, '21.112')] [2024-09-01 15:51:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3268608. Throughput: 0: 223.4. Samples: 818760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:10,148][00194] Avg episode reward: [(0, '21.764')] [2024-09-01 15:51:15,040][03034] Updated weights for policy 0, policy_version 800 (0.0701) [2024-09-01 15:51:15,136][00194] Fps is (10 sec: 1229.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3276800. Throughput: 0: 239.1. Samples: 820724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:51:15,140][00194] Avg episode reward: [(0, '21.618')] [2024-09-01 15:51:18,744][03021] Signal inference workers to stop experience collection... (800 times) [2024-09-01 15:51:18,854][03034] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-09-01 15:51:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3276800. Throughput: 0: 225.5. Samples: 821662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:51:20,139][00194] Avg episode reward: [(0, '20.990')] [2024-09-01 15:51:20,683][03021] Signal inference workers to resume experience collection... (800 times) [2024-09-01 15:51:20,684][03034] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-09-01 15:51:25,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3280896. Throughput: 0: 213.0. Samples: 822118. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:51:25,151][00194] Avg episode reward: [(0, '20.894')] [2024-09-01 15:51:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3289088. Throughput: 0: 227.7. Samples: 823750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:30,138][00194] Avg episode reward: [(0, '21.078')] [2024-09-01 15:51:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3293184. Throughput: 0: 233.4. Samples: 825184. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:51:35,143][00194] Avg episode reward: [(0, '21.050')] [2024-09-01 15:51:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3297280. Throughput: 0: 225.6. Samples: 825768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:51:40,139][00194] Avg episode reward: [(0, '21.044')] [2024-09-01 15:51:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3301376. Throughput: 0: 221.7. Samples: 826916. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:45,139][00194] Avg episode reward: [(0, '21.593')] [2024-09-01 15:51:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3305472. Throughput: 0: 242.6. Samples: 828966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:50,145][00194] Avg episode reward: [(0, '21.868')] [2024-09-01 15:51:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3309568. Throughput: 0: 235.1. Samples: 829338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:55,141][00194] Avg episode reward: [(0, '21.348')] [2024-09-01 15:52:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3313664. Throughput: 0: 215.9. Samples: 830438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:52:00,142][00194] Avg episode reward: [(0, '21.471')] [2024-09-01 15:52:01,626][03034] Updated weights for policy 0, policy_version 810 (0.1549) [2024-09-01 15:52:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3321856. Throughput: 0: 230.6. Samples: 832038. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:05,139][00194] Avg episode reward: [(0, '21.186')] [2024-09-01 15:52:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3325952. Throughput: 0: 243.1. Samples: 833058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:10,144][00194] Avg episode reward: [(0, '21.748')] [2024-09-01 15:52:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3330048. Throughput: 0: 228.8. Samples: 834046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:15,142][00194] Avg episode reward: [(0, '21.807')] [2024-09-01 15:52:19,027][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000814_3334144.pth... [2024-09-01 15:52:19,134][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000761_3117056.pth [2024-09-01 15:52:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3334144. Throughput: 0: 226.2. Samples: 835364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:20,140][00194] Avg episode reward: [(0, '21.584')] [2024-09-01 15:52:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3338240. Throughput: 0: 231.3. Samples: 836176. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:25,150][00194] Avg episode reward: [(0, '20.947')] [2024-09-01 15:52:30,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3342336. Throughput: 0: 235.4. Samples: 837508. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:52:30,142][00194] Avg episode reward: [(0, '21.145')] [2024-09-01 15:52:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3346432. Throughput: 0: 216.8. Samples: 838722. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:52:35,138][00194] Avg episode reward: [(0, '21.075')] [2024-09-01 15:52:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3350528. Throughput: 0: 224.1. Samples: 839422. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:52:40,147][00194] Avg episode reward: [(0, '20.888')] [2024-09-01 15:52:44,980][03034] Updated weights for policy 0, policy_version 820 (0.1016) [2024-09-01 15:52:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3358720. Throughput: 0: 238.8. Samples: 841186. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:45,139][00194] Avg episode reward: [(0, '21.004')] [2024-09-01 15:52:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3358720. Throughput: 0: 226.3. Samples: 842220. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:50,139][00194] Avg episode reward: [(0, '20.840')] [2024-09-01 15:52:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3366912. Throughput: 0: 217.8. Samples: 842858. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:55,141][00194] Avg episode reward: [(0, '20.895')] [2024-09-01 15:53:00,136][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3371008. Throughput: 0: 228.1. Samples: 844312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:00,139][00194] Avg episode reward: [(0, '21.330')] [2024-09-01 15:53:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3375104. Throughput: 0: 228.2. Samples: 845632. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:05,143][00194] Avg episode reward: [(0, '21.007')] [2024-09-01 15:53:10,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3379200. Throughput: 0: 224.8. Samples: 846290. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:10,139][00194] Avg episode reward: [(0, '20.868')] [2024-09-01 15:53:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3383296. Throughput: 0: 224.8. Samples: 847624. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:15,138][00194] Avg episode reward: [(0, '21.265')] [2024-09-01 15:53:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3387392. Throughput: 0: 235.4. Samples: 849314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:20,140][00194] Avg episode reward: [(0, '20.929')] [2024-09-01 15:53:25,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3391488. Throughput: 0: 229.9. Samples: 849768. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:25,145][00194] Avg episode reward: [(0, '21.351')] [2024-09-01 15:53:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3395584. Throughput: 0: 216.4. Samples: 850922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:30,149][00194] Avg episode reward: [(0, '21.766')] [2024-09-01 15:53:30,761][03034] Updated weights for policy 0, policy_version 830 (0.2096) [2024-09-01 15:53:35,136][00194] Fps is (10 sec: 1229.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3403776. Throughput: 0: 227.9. Samples: 852476. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:35,146][00194] Avg episode reward: [(0, '21.151')] [2024-09-01 15:53:40,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3407872. Throughput: 0: 236.3. Samples: 853494. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:40,140][00194] Avg episode reward: [(0, '21.429')] [2024-09-01 15:53:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3411968. Throughput: 0: 227.2. Samples: 854534. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:45,144][00194] Avg episode reward: [(0, '21.514')] [2024-09-01 15:53:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3416064. Throughput: 0: 226.6. Samples: 855828. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:50,139][00194] Avg episode reward: [(0, '21.650')] [2024-09-01 15:53:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3420160. Throughput: 0: 232.0. Samples: 856730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:55,144][00194] Avg episode reward: [(0, '21.326')] [2024-09-01 15:54:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3424256. Throughput: 0: 231.4. Samples: 858036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:00,144][00194] Avg episode reward: [(0, '21.083')] [2024-09-01 15:54:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3428352. Throughput: 0: 222.5. Samples: 859326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:05,138][00194] Avg episode reward: [(0, '21.089')] [2024-09-01 15:54:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3432448. Throughput: 0: 227.6. Samples: 860010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:10,146][00194] Avg episode reward: [(0, '20.824')] [2024-09-01 15:54:14,856][03034] Updated weights for policy 0, policy_version 840 (0.1524) [2024-09-01 15:54:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3440640. Throughput: 0: 240.3. Samples: 861734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:15,139][00194] Avg episode reward: [(0, '21.211')] [2024-09-01 15:54:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3440640. Throughput: 0: 227.5. Samples: 862714. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:20,144][00194] Avg episode reward: [(0, '21.716')] [2024-09-01 15:54:20,352][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth... [2024-09-01 15:54:20,468][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000787_3223552.pth [2024-09-01 15:54:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3448832. Throughput: 0: 226.2. Samples: 863674. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:25,139][00194] Avg episode reward: [(0, '21.447')] [2024-09-01 15:54:30,137][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3452928. Throughput: 0: 229.6. Samples: 864866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:30,140][00194] Avg episode reward: [(0, '21.911')] [2024-09-01 15:54:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3457024. Throughput: 0: 230.3. Samples: 866194. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:35,139][00194] Avg episode reward: [(0, '21.879')] [2024-09-01 15:54:40,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3461120. Throughput: 0: 224.9. Samples: 866850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:40,141][00194] Avg episode reward: [(0, '21.792')] [2024-09-01 15:54:45,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3465216. Throughput: 0: 226.0. Samples: 868208. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:45,139][00194] Avg episode reward: [(0, '21.688')] [2024-09-01 15:54:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3469312. Throughput: 0: 235.2. Samples: 869912. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:50,139][00194] Avg episode reward: [(0, '21.989')] [2024-09-01 15:54:55,138][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3473408. Throughput: 0: 225.6. Samples: 870164. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:55,140][00194] Avg episode reward: [(0, '21.854')] [2024-09-01 15:55:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3477504. Throughput: 0: 223.3. Samples: 871784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:00,139][00194] Avg episode reward: [(0, '22.389')] [2024-09-01 15:55:00,217][03034] Updated weights for policy 0, policy_version 850 (0.1723) [2024-09-01 15:55:02,590][03021] Signal inference workers to stop experience collection... (850 times) [2024-09-01 15:55:02,648][03034] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-09-01 15:55:03,981][03021] Signal inference workers to resume experience collection... (850 times) [2024-09-01 15:55:03,983][03034] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-09-01 15:55:05,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3485696. Throughput: 0: 228.7. Samples: 873006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:05,141][00194] Avg episode reward: [(0, '21.823')] [2024-09-01 15:55:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3489792. Throughput: 0: 227.0. Samples: 873890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:10,140][00194] Avg episode reward: [(0, '21.858')] [2024-09-01 15:55:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3493888. Throughput: 0: 226.2. Samples: 875046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:15,144][00194] Avg episode reward: [(0, '21.996')] [2024-09-01 15:55:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3497984. Throughput: 0: 231.9. Samples: 876628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:20,144][00194] Avg episode reward: [(0, '21.823')] [2024-09-01 15:55:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3502080. Throughput: 0: 232.5. Samples: 877312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:25,139][00194] Avg episode reward: [(0, '22.289')] [2024-09-01 15:55:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3506176. Throughput: 0: 226.8. Samples: 878412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:30,144][00194] Avg episode reward: [(0, '22.016')] [2024-09-01 15:55:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3510272. Throughput: 0: 223.4. Samples: 879966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:35,139][00194] Avg episode reward: [(0, '22.156')] [2024-09-01 15:55:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3518464. Throughput: 0: 233.2. Samples: 880656. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:40,141][00194] Avg episode reward: [(0, '22.198')] [2024-09-01 15:55:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3518464. Throughput: 0: 230.8. Samples: 882170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:45,139][00194] Avg episode reward: [(0, '21.941')] [2024-09-01 15:55:45,174][03034] Updated weights for policy 0, policy_version 860 (0.0529) [2024-09-01 15:55:50,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3522560. Throughput: 0: 227.2. Samples: 883228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:50,144][00194] Avg episode reward: [(0, '22.074')] [2024-09-01 15:55:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3530752. Throughput: 0: 223.4. Samples: 883944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:55,138][00194] Avg episode reward: [(0, '22.073')] [2024-09-01 15:56:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3534848. Throughput: 0: 228.3. Samples: 885318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:00,144][00194] Avg episode reward: [(0, '22.349')] [2024-09-01 15:56:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3538944. Throughput: 0: 222.0. Samples: 886618. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:05,143][00194] Avg episode reward: [(0, '22.816')] [2024-09-01 15:56:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3543040. Throughput: 0: 222.9. Samples: 887342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:10,143][00194] Avg episode reward: [(0, '23.400')] [2024-09-01 15:56:12,309][03021] Saving new best policy, reward=23.400! [2024-09-01 15:56:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3547136. Throughput: 0: 232.2. Samples: 888860. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:15,146][00194] Avg episode reward: [(0, '23.841')] [2024-09-01 15:56:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3551232. Throughput: 0: 231.4. Samples: 890380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:20,141][00194] Avg episode reward: [(0, '23.955')] [2024-09-01 15:56:21,064][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000868_3555328.pth... [2024-09-01 15:56:21,245][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000814_3334144.pth [2024-09-01 15:56:21,269][03021] Saving new best policy, reward=23.841! [2024-09-01 15:56:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3555328. Throughput: 0: 223.7. Samples: 890724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:25,140][00194] Avg episode reward: [(0, '24.086')] [2024-09-01 15:56:26,449][03021] Saving new best policy, reward=23.955! [2024-09-01 15:56:26,578][03021] Saving new best policy, reward=24.086! [2024-09-01 15:56:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3559424. Throughput: 0: 223.8. Samples: 892242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:30,138][00194] Avg episode reward: [(0, '23.894')] [2024-09-01 15:56:30,884][03034] Updated weights for policy 0, policy_version 870 (0.1527) [2024-09-01 15:56:35,143][00194] Fps is (10 sec: 1227.9, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 3567616. Throughput: 0: 228.1. Samples: 893494. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:35,151][00194] Avg episode reward: [(0, '24.334')] [2024-09-01 15:56:39,670][03021] Saving new best policy, reward=24.334! [2024-09-01 15:56:40,141][00194] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3571712. Throughput: 0: 227.5. Samples: 894184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:40,144][00194] Avg episode reward: [(0, '24.897')] [2024-09-01 15:56:44,827][03021] Saving new best policy, reward=24.897! [2024-09-01 15:56:45,136][00194] Fps is (10 sec: 819.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3575808. Throughput: 0: 224.9. Samples: 895438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:45,138][00194] Avg episode reward: [(0, '24.088')] [2024-09-01 15:56:50,136][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3579904. Throughput: 0: 229.5. Samples: 896944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:50,139][00194] Avg episode reward: [(0, '23.959')] [2024-09-01 15:56:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3584000. Throughput: 0: 228.2. Samples: 897610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:55,146][00194] Avg episode reward: [(0, '24.016')] [2024-09-01 15:57:00,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3588096. Throughput: 0: 218.5. Samples: 898694. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:00,150][00194] Avg episode reward: [(0, '23.223')] [2024-09-01 15:57:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3592192. Throughput: 0: 219.6. Samples: 900264. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:05,146][00194] Avg episode reward: [(0, '23.286')] [2024-09-01 15:57:10,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3596288. Throughput: 0: 222.5. Samples: 900738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:57:10,148][00194] Avg episode reward: [(0, '23.471')] [2024-09-01 15:57:15,145][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3600384. Throughput: 0: 227.9. Samples: 902498. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:57:15,148][00194] Avg episode reward: [(0, '23.546')] [2024-09-01 15:57:16,193][03034] Updated weights for policy 0, policy_version 880 (0.1056) [2024-09-01 15:57:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3604480. Throughput: 0: 225.2. Samples: 903626. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:57:20,138][00194] Avg episode reward: [(0, '23.941')] [2024-09-01 15:57:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3612672. Throughput: 0: 228.4. Samples: 904460. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:25,146][00194] Avg episode reward: [(0, '23.880')] [2024-09-01 15:57:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3616768. Throughput: 0: 232.1. Samples: 905882. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:30,138][00194] Avg episode reward: [(0, '23.270')] [2024-09-01 15:57:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 3620864. Throughput: 0: 222.6. Samples: 906962. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:35,138][00194] Avg episode reward: [(0, '23.261')] [2024-09-01 15:57:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3624960. Throughput: 0: 224.9. Samples: 907730. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:40,141][00194] Avg episode reward: [(0, '23.318')] [2024-09-01 15:57:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3629056. Throughput: 0: 235.6. Samples: 909294. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:57:45,145][00194] Avg episode reward: [(0, '22.652')] [2024-09-01 15:57:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3633152. Throughput: 0: 235.8. Samples: 910876. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:57:50,144][00194] Avg episode reward: [(0, '22.464')] [2024-09-01 15:57:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3637248. Throughput: 0: 227.6. Samples: 910982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:57:55,141][00194] Avg episode reward: [(0, '21.841')] [2024-09-01 15:58:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3641344. Throughput: 0: 226.2. Samples: 912676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:58:00,149][00194] Avg episode reward: [(0, '21.566')] [2024-09-01 15:58:00,665][03034] Updated weights for policy 0, policy_version 890 (0.1654) [2024-09-01 15:58:05,138][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3649536. Throughput: 0: 229.5. Samples: 913952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:58:05,141][00194] Avg episode reward: [(0, '21.470')] [2024-09-01 15:58:10,138][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3653632. Throughput: 0: 226.7. Samples: 914660. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:58:10,140][00194] Avg episode reward: [(0, '20.831')] [2024-09-01 15:58:15,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3657728. Throughput: 0: 225.2. Samples: 916018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:58:15,138][00194] Avg episode reward: [(0, '20.667')] [2024-09-01 15:58:18,419][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth... [2024-09-01 15:58:18,539][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth [2024-09-01 15:58:20,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3661824. Throughput: 0: 235.7. Samples: 917570. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:20,138][00194] Avg episode reward: [(0, '21.120')] [2024-09-01 15:58:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3665920. Throughput: 0: 233.9. Samples: 918254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:25,139][00194] Avg episode reward: [(0, '20.924')] [2024-09-01 15:58:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3670016. Throughput: 0: 222.7. Samples: 919316. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:30,142][00194] Avg episode reward: [(0, '21.131')] [2024-09-01 15:58:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3674112. Throughput: 0: 223.5. Samples: 920932. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:35,139][00194] Avg episode reward: [(0, '20.997')] [2024-09-01 15:58:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3678208. Throughput: 0: 236.2. Samples: 921612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:40,139][00194] Avg episode reward: [(0, '21.241')] [2024-09-01 15:58:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3682304. Throughput: 0: 231.7. Samples: 923102. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:45,138][00194] Avg episode reward: [(0, '21.549')] [2024-09-01 15:58:45,651][03034] Updated weights for policy 0, policy_version 900 (0.0575) [2024-09-01 15:58:49,322][03021] Signal inference workers to stop experience collection... (900 times) [2024-09-01 15:58:49,396][03034] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-09-01 15:58:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3686400. Throughput: 0: 226.9. Samples: 924160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:58:50,141][00194] Avg episode reward: [(0, '21.557')] [2024-09-01 15:58:50,492][03021] Signal inference workers to resume experience collection... (900 times) [2024-09-01 15:58:50,493][03034] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-09-01 15:58:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3694592. Throughput: 0: 230.4. Samples: 925028. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:58:55,145][00194] Avg episode reward: [(0, '22.134')] [2024-09-01 15:59:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3698688. Throughput: 0: 230.0. Samples: 926366. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:59:00,140][00194] Avg episode reward: [(0, '21.879')] [2024-09-01 15:59:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3702784. Throughput: 0: 218.3. Samples: 927392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:05,139][00194] Avg episode reward: [(0, '21.876')] [2024-09-01 15:59:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3706880. Throughput: 0: 222.0. Samples: 928244. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:10,144][00194] Avg episode reward: [(0, '21.649')] [2024-09-01 15:59:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3710976. Throughput: 0: 231.6. Samples: 929736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:15,148][00194] Avg episode reward: [(0, '21.283')] [2024-09-01 15:59:20,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3715072. Throughput: 0: 224.1. Samples: 931016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:20,150][00194] Avg episode reward: [(0, '21.560')] [2024-09-01 15:59:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3719168. Throughput: 0: 221.4. Samples: 931574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:59:25,144][00194] Avg episode reward: [(0, '22.354')] [2024-09-01 15:59:30,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3723264. Throughput: 0: 223.9. Samples: 933176. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:59:30,139][00194] Avg episode reward: [(0, '22.562')] [2024-09-01 15:59:30,695][03034] Updated weights for policy 0, policy_version 910 (0.1564) [2024-09-01 15:59:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3731456. Throughput: 0: 230.4. Samples: 934526. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:35,139][00194] Avg episode reward: [(0, '22.871')] [2024-09-01 15:59:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3735552. Throughput: 0: 229.1. Samples: 935338. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:40,142][00194] Avg episode reward: [(0, '23.149')] [2024-09-01 15:59:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3739648. Throughput: 0: 225.9. Samples: 936532. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:45,146][00194] Avg episode reward: [(0, '23.944')] [2024-09-01 15:59:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3743744. Throughput: 0: 235.4. Samples: 937984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:50,139][00194] Avg episode reward: [(0, '24.146')] [2024-09-01 15:59:55,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3747840. Throughput: 0: 230.7. Samples: 938624. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:55,144][00194] Avg episode reward: [(0, '24.034')] [2024-09-01 16:00:00,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3751936. Throughput: 0: 223.5. Samples: 939794. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:00,140][00194] Avg episode reward: [(0, '24.205')] [2024-09-01 16:00:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3756032. Throughput: 0: 228.6. Samples: 941300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:05,141][00194] Avg episode reward: [(0, '24.291')] [2024-09-01 16:00:10,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3760128. Throughput: 0: 226.6. Samples: 941772. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:10,147][00194] Avg episode reward: [(0, '24.125')] [2024-09-01 16:00:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3764224. Throughput: 0: 227.2. Samples: 943402. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:15,139][00194] Avg episode reward: [(0, '23.769')] [2024-09-01 16:00:16,114][03034] Updated weights for policy 0, policy_version 920 (0.1540) [2024-09-01 16:00:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3768320. Throughput: 0: 223.6. Samples: 944586. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:20,148][00194] Avg episode reward: [(0, '24.086')] [2024-09-01 16:00:21,046][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000921_3772416.pth... [2024-09-01 16:00:21,155][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000868_3555328.pth [2024-09-01 16:00:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3776512. Throughput: 0: 221.5. Samples: 945304. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:25,139][00194] Avg episode reward: [(0, '23.475')] [2024-09-01 16:00:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3780608. Throughput: 0: 225.3. Samples: 946672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:30,141][00194] Avg episode reward: [(0, '24.147')] [2024-09-01 16:00:35,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3784704. Throughput: 0: 218.6. Samples: 947822. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:00:35,144][00194] Avg episode reward: [(0, '24.289')] [2024-09-01 16:00:40,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3788800. Throughput: 0: 219.0. Samples: 948482. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:00:40,151][00194] Avg episode reward: [(0, '24.021')] [2024-09-01 16:00:45,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3792896. Throughput: 0: 230.3. Samples: 950156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:45,150][00194] Avg episode reward: [(0, '23.460')] [2024-09-01 16:00:50,137][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3796992. Throughput: 0: 231.0. Samples: 951696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:50,144][00194] Avg episode reward: [(0, '23.930')] [2024-09-01 16:00:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3801088. Throughput: 0: 227.1. Samples: 951992. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:55,138][00194] Avg episode reward: [(0, '24.220')] [2024-09-01 16:01:00,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3805184. Throughput: 0: 226.5. Samples: 953594. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:00,148][00194] Avg episode reward: [(0, '23.937')] [2024-09-01 16:01:00,770][03034] Updated weights for policy 0, policy_version 930 (0.0685) [2024-09-01 16:01:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3813376. Throughput: 0: 230.3. Samples: 954950. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:05,143][00194] Avg episode reward: [(0, '24.109')] [2024-09-01 16:01:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3813376. Throughput: 0: 231.3. Samples: 955712. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:10,143][00194] Avg episode reward: [(0, '24.109')] [2024-09-01 16:01:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3821568. Throughput: 0: 227.4. Samples: 956906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:15,138][00194] Avg episode reward: [(0, '23.607')] [2024-09-01 16:01:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3825664. Throughput: 0: 234.9. Samples: 958392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:20,139][00194] Avg episode reward: [(0, '22.258')] [2024-09-01 16:01:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3829760. Throughput: 0: 235.4. Samples: 959072. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:25,140][00194] Avg episode reward: [(0, '22.667')] [2024-09-01 16:01:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3833856. Throughput: 0: 222.8. Samples: 960184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:30,145][00194] Avg episode reward: [(0, '23.086')] [2024-09-01 16:01:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3837952. Throughput: 0: 223.7. Samples: 961762. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:35,140][00194] Avg episode reward: [(0, '23.359')] [2024-09-01 16:01:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3842048. Throughput: 0: 233.4. Samples: 962496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:40,146][00194] Avg episode reward: [(0, '22.653')] [2024-09-01 16:01:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3846144. Throughput: 0: 230.8. Samples: 963980. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:45,139][00194] Avg episode reward: [(0, '22.871')] [2024-09-01 16:01:45,943][03034] Updated weights for policy 0, policy_version 940 (0.0555) [2024-09-01 16:01:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3850240. Throughput: 0: 227.9. Samples: 965206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:50,146][00194] Avg episode reward: [(0, '22.790')] [2024-09-01 16:01:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3858432. Throughput: 0: 228.5. Samples: 965996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:55,138][00194] Avg episode reward: [(0, '22.756')] [2024-09-01 16:02:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3862528. Throughput: 0: 231.2. Samples: 967312. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:00,140][00194] Avg episode reward: [(0, '22.462')] [2024-09-01 16:02:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3866624. Throughput: 0: 224.1. Samples: 968476. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:05,139][00194] Avg episode reward: [(0, '22.907')] [2024-09-01 16:02:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3870720. Throughput: 0: 224.6. Samples: 969178. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:10,145][00194] Avg episode reward: [(0, '22.353')] [2024-09-01 16:02:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3874816. Throughput: 0: 236.0. Samples: 970806. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:15,144][00194] Avg episode reward: [(0, '23.123')] [2024-09-01 16:02:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3878912. Throughput: 0: 235.2. Samples: 972346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:20,139][00194] Avg episode reward: [(0, '22.516')] [2024-09-01 16:02:21,425][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000948_3883008.pth... [2024-09-01 16:02:21,533][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth [2024-09-01 16:02:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3883008. Throughput: 0: 226.1. Samples: 972670. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:25,139][00194] Avg episode reward: [(0, '22.653')] [2024-09-01 16:02:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3887104. Throughput: 0: 229.3. Samples: 974300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:30,145][00194] Avg episode reward: [(0, '22.519')] [2024-09-01 16:02:30,804][03034] Updated weights for policy 0, policy_version 950 (0.1483) [2024-09-01 16:02:33,248][03021] Signal inference workers to stop experience collection... (950 times) [2024-09-01 16:02:33,319][03034] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-09-01 16:02:34,217][03021] Signal inference workers to resume experience collection... (950 times) [2024-09-01 16:02:34,219][03034] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-09-01 16:02:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3895296. Throughput: 0: 229.2. Samples: 975518. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:35,143][00194] Avg episode reward: [(0, '23.093')] [2024-09-01 16:02:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3899392. Throughput: 0: 226.3. Samples: 976180. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:40,141][00194] Avg episode reward: [(0, '23.167')] [2024-09-01 16:02:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3903488. Throughput: 0: 227.1. Samples: 977530. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:45,139][00194] Avg episode reward: [(0, '23.087')] [2024-09-01 16:02:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3907584. Throughput: 0: 229.9. Samples: 978822. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:50,139][00194] Avg episode reward: [(0, '23.175')] [2024-09-01 16:02:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3911680. Throughput: 0: 230.1. Samples: 979534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:55,139][00194] Avg episode reward: [(0, '23.824')] [2024-09-01 16:03:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3915776. Throughput: 0: 219.8. Samples: 980698. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:00,139][00194] Avg episode reward: [(0, '24.015')] [2024-09-01 16:03:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3919872. Throughput: 0: 220.0. Samples: 982244. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:05,139][00194] Avg episode reward: [(0, '23.490')] [2024-09-01 16:03:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3923968. Throughput: 0: 224.1. Samples: 982756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:10,149][00194] Avg episode reward: [(0, '23.447')] [2024-09-01 16:03:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3928064. Throughput: 0: 225.7. Samples: 984458. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:15,140][00194] Avg episode reward: [(0, '23.740')] [2024-09-01 16:03:16,295][03034] Updated weights for policy 0, policy_version 960 (0.2640) [2024-09-01 16:03:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3932160. Throughput: 0: 222.2. Samples: 985518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:03:20,148][00194] Avg episode reward: [(0, '24.262')] [2024-09-01 16:03:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3940352. Throughput: 0: 230.0. Samples: 986528. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:25,147][00194] Avg episode reward: [(0, '24.329')] [2024-09-01 16:03:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3944448. Throughput: 0: 226.9. Samples: 987742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:30,143][00194] Avg episode reward: [(0, '24.238')] [2024-09-01 16:03:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3948544. Throughput: 0: 225.1. Samples: 988952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:35,143][00194] Avg episode reward: [(0, '24.320')] [2024-09-01 16:03:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3952640. Throughput: 0: 226.5. Samples: 989726. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:40,138][00194] Avg episode reward: [(0, '24.216')] [2024-09-01 16:03:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3956736. Throughput: 0: 235.0. Samples: 991274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:45,150][00194] Avg episode reward: [(0, '25.185')] [2024-09-01 16:03:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3960832. Throughput: 0: 234.0. Samples: 992774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:50,140][00194] Avg episode reward: [(0, '25.249')] [2024-09-01 16:03:51,110][03021] Saving new best policy, reward=25.185! [2024-09-01 16:03:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3964928. Throughput: 0: 230.9. Samples: 993148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:55,142][00194] Avg episode reward: [(0, '25.319')] [2024-09-01 16:03:56,231][03021] Saving new best policy, reward=25.249! [2024-09-01 16:03:59,995][03021] Saving new best policy, reward=25.319! [2024-09-01 16:04:00,007][03034] Updated weights for policy 0, policy_version 970 (0.1020) [2024-09-01 16:04:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3973120. Throughput: 0: 229.5. Samples: 994784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:04:00,138][00194] Avg episode reward: [(0, '25.156')] [2024-09-01 16:04:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3977216. Throughput: 0: 230.3. Samples: 995880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:04:05,146][00194] Avg episode reward: [(0, '25.008')] [2024-09-01 16:04:10,139][00194] Fps is (10 sec: 818.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3981312. Throughput: 0: 224.2. Samples: 996620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:10,147][00194] Avg episode reward: [(0, '25.910')] [2024-09-01 16:04:14,339][03021] Saving new best policy, reward=25.910! [2024-09-01 16:04:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3985408. Throughput: 0: 224.9. Samples: 997864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:15,146][00194] Avg episode reward: [(0, '25.117')] [2024-09-01 16:04:18,206][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth... [2024-09-01 16:04:18,320][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000921_3772416.pth [2024-09-01 16:04:20,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3989504. Throughput: 0: 235.2. Samples: 999536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:20,143][00194] Avg episode reward: [(0, '24.987')] [2024-09-01 16:04:25,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3993600. Throughput: 0: 233.1. Samples: 1000218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:25,149][00194] Avg episode reward: [(0, '25.077')] [2024-09-01 16:04:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3997696. Throughput: 0: 224.0. Samples: 1001352. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:30,141][00194] Avg episode reward: [(0, '24.695')] [2024-09-01 16:04:35,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4001792. Throughput: 0: 226.2. Samples: 1002954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:35,139][00194] Avg episode reward: [(0, '24.647')] [2024-09-01 16:04:36,205][03021] Stopping Batcher_0... [2024-09-01 16:04:36,207][03021] Loop batcher_evt_loop terminating... [2024-09-01 16:04:36,207][00194] Component Batcher_0 stopped! [2024-09-01 16:04:36,419][03034] Weights refcount: 2 0 [2024-09-01 16:04:36,423][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:04:36,429][03034] Stopping InferenceWorker_p0-w0... [2024-09-01 16:04:36,430][03034] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:04:36,780][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:04:36,788][03037] Stopping RolloutWorker_w2... [2024-09-01 16:04:36,814][03037] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:04:36,840][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:04:36,857][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:04:36,865][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:04:36,841][03036] Stopping RolloutWorker_w1... [2024-09-01 16:04:36,887][03036] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:04:36,890][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:04:36,911][03040] Stopping RolloutWorker_w5... [2024-09-01 16:04:36,911][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:04:36,926][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:04:36,871][03039] Stopping RolloutWorker_w4... [2024-09-01 16:04:36,888][03038] Stopping RolloutWorker_w3... [2024-09-01 16:04:36,946][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:04:36,898][03041] Stopping RolloutWorker_w6... [2024-09-01 16:04:36,954][03040] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:04:36,962][03038] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:04:36,952][03035] Stopping RolloutWorker_w0... [2024-09-01 16:04:36,954][03039] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:04:36,964][03041] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:04:36,945][03042] Stopping RolloutWorker_w7... [2024-09-01 16:04:36,992][03042] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:04:36,994][03035] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:04:41,411][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2024-09-01 16:04:41,524][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000948_3883008.pth [2024-09-01 16:04:41,548][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2024-09-01 16:04:41,741][03021] Stopping LearnerWorker_p0... [2024-09-01 16:04:41,741][03021] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:04:41,741][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:04:41,745][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:04:43,065][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:04:43,073][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:04:44,240][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:04:44,250][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:04:44,278][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:04:44,286][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:04:44,297][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:04:44,301][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:04:44,309][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:04:44,314][00194] Batcher 0 profile tree view: batching: 20.5903, releasing_batches: 0.2968 [2024-09-01 16:04:44,318][00194] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 44.9605 update_model: 132.0643 weight_update: 0.0560 one_step: 0.0290 handle_policy_step: 2882.0811 deserialize: 91.9792, stack: 14.2129, obs_to_device_normalize: 498.6304, forward: 2093.8209, send_messages: 67.3715 prepare_outputs: 34.0184 to_cpu: 3.4641 [2024-09-01 16:04:44,320][00194] Learner 0 profile tree view: misc: 0.0066, prepare_batch: 1289.7853 train: 3103.9562 epoch_init: 0.0091, minibatch_init: 0.0245, losses_postprocess: 0.1424, kl_divergence: 0.4661, after_optimizer: 2.6855 calculate_losses: 1524.2036 losses_init: 0.0044, forward_head: 1364.9844, bptt_initial: 4.4690, tail: 3.4347, advantages_returns: 0.2297, losses: 1.4560 bptt: 149.0615 bptt_forward_core: 148.1692 update: 1575.6538 clip: 3.8090 [2024-09-01 16:04:44,323][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6465, enqueue_policy_requests: 55.2539, env_step: 1623.3020, overhead: 40.2221, complete_rollouts: 17.8695 save_policy_outputs: 40.9623 split_output_tensors: 12.9830 [2024-09-01 16:04:44,325][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.7192, enqueue_policy_requests: 54.9433, env_step: 1625.3504, overhead: 40.9641, complete_rollouts: 15.5030 save_policy_outputs: 41.8531 split_output_tensors: 13.9784 [2024-09-01 16:04:44,327][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:04:44,329][00194] Runner profile tree view: main_loop: 4465.4549 [2024-09-01 16:04:44,331][00194] Collected {0: 4009984}, FPS: 898.0 [2024-09-01 16:05:41,893][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:05:41,897][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:05:41,898][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:05:41,901][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:05:41,903][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:05:41,905][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:05:41,907][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:05:41,908][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:05:41,911][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:05:41,912][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:05:41,914][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:05:41,915][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:05:41,917][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:05:41,920][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:05:41,922][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:05:41,924][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:05:41,926][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:05:41,927][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:05:41,928][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:05:41,930][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:05:41,931][00194] register_encoder_factory: [2024-09-01 16:05:41,965][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:05:41,975][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 16:05:41,976][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 16:05:41,980][00194] Weights and Biases integration disabled [2024-09-01 16:05:41,986][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 16:05:45,681][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 16:05:45,685][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 16:05:45,693][00194] Rollout worker 0 uses device cpu [2024-09-01 16:05:45,697][00194] Rollout worker 1 uses device cpu [2024-09-01 16:05:45,701][00194] Rollout worker 2 uses device cpu [2024-09-01 16:05:45,704][00194] Rollout worker 3 uses device cpu [2024-09-01 16:05:45,706][00194] Rollout worker 4 uses device cpu [2024-09-01 16:05:45,707][00194] Rollout worker 5 uses device cpu [2024-09-01 16:05:45,712][00194] Rollout worker 6 uses device cpu [2024-09-01 16:05:45,715][00194] Rollout worker 7 uses device cpu [2024-09-01 16:05:45,925][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 16:05:45,969][00194] Starting all processes... [2024-09-01 16:05:45,971][00194] Starting process learner_proc0 [2024-09-01 16:05:46,019][00194] Starting all processes... [2024-09-01 16:05:46,027][00194] Starting process inference_proc0-0 [2024-09-01 16:05:46,028][00194] Starting process rollout_proc0 [2024-09-01 16:05:46,028][00194] Starting process rollout_proc1 [2024-09-01 16:05:46,028][00194] Starting process rollout_proc2 [2024-09-01 16:05:46,029][00194] Starting process rollout_proc3 [2024-09-01 16:05:46,029][00194] Starting process rollout_proc4 [2024-09-01 16:05:46,029][00194] Starting process rollout_proc5 [2024-09-01 16:05:46,038][00194] Starting process rollout_proc7 [2024-09-01 16:05:46,038][00194] Starting process rollout_proc6 [2024-09-01 16:06:06,170][25505] Starting seed is not provided [2024-09-01 16:06:06,170][25505] Initializing actor-critic model on device cpu [2024-09-01 16:06:06,171][25505] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:06:06,174][25505] RunningMeanStd input shape: (1,) [2024-09-01 16:06:06,180][00194] Heartbeat connected on Batcher_0 [2024-09-01 16:06:06,359][25505] ConvEncoder: input_channels=3 [2024-09-01 16:06:06,444][25520] Worker 1 uses CPU cores [1] [2024-09-01 16:06:06,584][25523] Worker 4 uses CPU cores [0] [2024-09-01 16:06:06,611][25524] Worker 5 uses CPU cores [1] [2024-09-01 16:06:06,653][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 16:06:06,804][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 16:06:06,850][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 16:06:06,868][25522] Worker 3 uses CPU cores [1] [2024-09-01 16:06:06,906][25518] Worker 0 uses CPU cores [0] [2024-09-01 16:06:06,916][25525] Worker 6 uses CPU cores [0] [2024-09-01 16:06:06,920][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 16:06:06,977][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 16:06:06,987][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 16:06:06,993][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 16:06:07,002][25521] Worker 2 uses CPU cores [0] [2024-09-01 16:06:07,014][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 16:06:07,021][25526] Worker 7 uses CPU cores [1] [2024-09-01 16:06:07,032][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 16:06:07,100][25505] Conv encoder output size: 512 [2024-09-01 16:06:07,101][25505] Policy head output size: 512 [2024-09-01 16:06:07,129][25505] Created Actor Critic model with architecture: [2024-09-01 16:06:07,130][25505] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 16:06:07,851][25505] Using optimizer [2024-09-01 16:06:07,853][25505] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2024-09-01 16:06:07,924][25505] Loading model from checkpoint [2024-09-01 16:06:07,984][25505] Loaded experiment state at self.train_step=979, self.env_steps=4009984 [2024-09-01 16:06:07,985][25505] Initialized policy 0 weights for model version 979 [2024-09-01 16:06:07,990][25519] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:06:07,994][25505] LearnerWorker_p0 finished initialization! [2024-09-01 16:06:07,993][25519] RunningMeanStd input shape: (1,) [2024-09-01 16:06:08,001][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 16:06:08,026][25519] ConvEncoder: input_channels=3 [2024-09-01 16:06:08,238][25519] Conv encoder output size: 512 [2024-09-01 16:06:08,238][25519] Policy head output size: 512 [2024-09-01 16:06:08,271][00194] Inference worker 0-0 is ready! [2024-09-01 16:06:08,275][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 16:06:08,469][25522] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,472][25520] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,474][25526] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,480][25521] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,486][25525] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,477][25524] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,490][25518] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,492][25523] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:10,166][25526] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,171][25520] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,175][25522] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,526][25521] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,549][25525] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,554][25518] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,553][25523] Decorrelating experience for 0 frames... [2024-09-01 16:06:11,487][25521] Decorrelating experience for 32 frames... [2024-09-01 16:06:11,492][25525] Decorrelating experience for 32 frames... [2024-09-01 16:06:11,731][25526] Decorrelating experience for 32 frames... [2024-09-01 16:06:11,806][25524] Decorrelating experience for 0 frames... [2024-09-01 16:06:11,986][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4009984. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:12,433][25520] Decorrelating experience for 32 frames... [2024-09-01 16:06:12,518][25522] Decorrelating experience for 32 frames... [2024-09-01 16:06:12,869][25521] Decorrelating experience for 64 frames... [2024-09-01 16:06:13,105][25523] Decorrelating experience for 32 frames... [2024-09-01 16:06:13,232][25518] Decorrelating experience for 32 frames... [2024-09-01 16:06:13,808][25524] Decorrelating experience for 32 frames... [2024-09-01 16:06:14,045][25526] Decorrelating experience for 64 frames... [2024-09-01 16:06:14,888][25521] Decorrelating experience for 96 frames... [2024-09-01 16:06:14,957][25520] Decorrelating experience for 64 frames... [2024-09-01 16:06:15,259][25518] Decorrelating experience for 64 frames... [2024-09-01 16:06:16,246][25522] Decorrelating experience for 64 frames... [2024-09-01 16:06:16,688][25524] Decorrelating experience for 64 frames... [2024-09-01 16:06:16,988][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4009984. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:16,991][00194] Avg episode reward: [(0, '0.320')] [2024-09-01 16:06:17,033][25526] Decorrelating experience for 96 frames... [2024-09-01 16:06:17,742][25525] Decorrelating experience for 64 frames... [2024-09-01 16:06:19,759][25518] Decorrelating experience for 96 frames... [2024-09-01 16:06:20,199][25522] Decorrelating experience for 96 frames... [2024-09-01 16:06:20,358][25520] Decorrelating experience for 96 frames... [2024-09-01 16:06:20,698][25524] Decorrelating experience for 96 frames... [2024-09-01 16:06:21,988][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4009984. Throughput: 0: 66.6. Samples: 666. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:21,992][00194] Avg episode reward: [(0, '0.320')] [2024-09-01 16:06:22,087][25525] Decorrelating experience for 96 frames... [2024-09-01 16:06:22,583][25523] Decorrelating experience for 64 frames... [2024-09-01 16:06:23,851][25523] Decorrelating experience for 96 frames... [2024-09-01 16:06:26,012][25505] Signal inference workers to stop experience collection... [2024-09-01 16:06:26,059][25519] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 16:06:26,986][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4009984. Throughput: 0: 175.6. Samples: 2634. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:26,993][00194] Avg episode reward: [(0, '2.320')] [2024-09-01 16:06:27,908][25505] Signal inference workers to resume experience collection... [2024-09-01 16:06:27,910][25505] Stopping Batcher_0... [2024-09-01 16:06:27,913][25505] Loop batcher_evt_loop terminating... [2024-09-01 16:06:27,921][00194] Component Batcher_0 stopped! [2024-09-01 16:06:27,947][25519] Weights refcount: 2 0 [2024-09-01 16:06:27,950][25519] Stopping InferenceWorker_p0-w0... [2024-09-01 16:06:27,951][25519] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:06:27,950][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:06:28,400][25523] Stopping RolloutWorker_w4... [2024-09-01 16:06:28,400][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:06:28,402][25523] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:06:28,415][25521] Stopping RolloutWorker_w2... [2024-09-01 16:06:28,415][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:06:28,418][25521] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:06:28,431][25525] Stopping RolloutWorker_w6... [2024-09-01 16:06:28,431][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:06:28,439][25525] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:06:28,465][25520] Stopping RolloutWorker_w1... [2024-09-01 16:06:28,465][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:06:28,466][25520] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:06:28,493][25522] Stopping RolloutWorker_w3... [2024-09-01 16:06:28,493][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:06:28,493][25522] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:06:28,509][25526] Stopping RolloutWorker_w7... [2024-09-01 16:06:28,510][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:06:28,517][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:06:28,523][25524] Stopping RolloutWorker_w5... [2024-09-01 16:06:28,510][25526] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:06:28,524][25524] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:06:28,569][25518] Stopping RolloutWorker_w0... [2024-09-01 16:06:28,569][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:06:28,578][25518] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:06:33,646][25505] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-01 16:06:33,725][25505] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth [2024-09-01 16:06:33,737][25505] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-01 16:06:33,878][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:06:33,885][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:06:33,890][25505] Stopping LearnerWorker_p0... [2024-09-01 16:06:33,891][25505] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:06:34,540][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:06:34,545][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:06:34,550][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:06:34,556][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:06:34,560][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:06:34,566][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:06:34,570][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:06:34,574][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:06:34,580][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:06:34,583][00194] Batcher 0 profile tree view: batching: 0.0506, releasing_batches: 0.0020 [2024-09-01 16:06:34,586][00194] InferenceWorker_p0-w0 profile tree view: update_model: 0.0646 wait_policy: 0.0001 wait_policy_total: 9.7355 one_step: 0.0318 handle_policy_step: 7.4827 deserialize: 0.2000, stack: 0.0383, obs_to_device_normalize: 1.1280, forward: 5.5730, send_messages: 0.2052 prepare_outputs: 0.1651 to_cpu: 0.0130 [2024-09-01 16:06:34,590][00194] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 4.1518 train: 6.0339 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0007, after_optimizer: 0.0047 calculate_losses: 2.2055 losses_init: 0.0000, forward_head: 1.9855, bptt_initial: 0.0043, tail: 0.0103, advantages_returns: 0.0010, losses: 0.0028 bptt: 0.2010 bptt_forward_core: 0.1998 update: 3.8215 clip: 0.0086 [2024-09-01 16:06:34,592][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.1724, env_step: 2.7167, overhead: 0.0721, complete_rollouts: 0.0140 save_policy_outputs: 0.1292 split_output_tensors: 0.0193 [2024-09-01 16:06:34,595][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0038, enqueue_policy_requests: 0.6344, env_step: 4.8570, overhead: 0.1751, complete_rollouts: 0.0165 save_policy_outputs: 0.2269 split_output_tensors: 0.0810 [2024-09-01 16:06:34,599][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:06:34,603][00194] Runner profile tree view: main_loop: 48.6341 [2024-09-01 16:06:34,605][00194] Collected {0: 4018176}, FPS: 168.4 [2024-09-01 16:06:48,086][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:06:48,089][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:06:48,092][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:06:48,097][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:06:48,100][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:06:48,101][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:06:48,103][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:06:48,104][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:06:48,106][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:06:48,107][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:06:48,109][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:06:48,110][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:06:48,112][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:06:48,113][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:06:48,115][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:06:48,116][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:06:48,117][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:06:48,119][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:06:48,121][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:06:48,122][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:06:48,124][00194] register_encoder_factory: [2024-09-01 16:06:48,154][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:06:48,160][00194] Overriding arg 'train_for_env_steps' with value 6000000 passed from command line [2024-09-01 16:06:48,167][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 16:06:48,171][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 16:06:48,172][00194] Weights and Biases integration disabled [2024-09-01 16:06:48,177][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 16:06:50,270][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=6000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 16:06:50,273][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 16:06:50,277][00194] Rollout worker 0 uses device cpu [2024-09-01 16:06:50,279][00194] Rollout worker 1 uses device cpu [2024-09-01 16:06:50,281][00194] Rollout worker 2 uses device cpu [2024-09-01 16:06:50,283][00194] Rollout worker 3 uses device cpu [2024-09-01 16:06:50,284][00194] Rollout worker 4 uses device cpu [2024-09-01 16:06:50,286][00194] Rollout worker 5 uses device cpu [2024-09-01 16:06:50,287][00194] Rollout worker 6 uses device cpu [2024-09-01 16:06:50,288][00194] Rollout worker 7 uses device cpu [2024-09-01 16:06:50,458][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 16:06:50,500][00194] Starting all processes... [2024-09-01 16:06:50,502][00194] Starting process learner_proc0 [2024-09-01 16:06:50,557][00194] Starting all processes... [2024-09-01 16:06:50,565][00194] Starting process inference_proc0-0 [2024-09-01 16:06:50,566][00194] Starting process rollout_proc0 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc1 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc2 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc3 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc4 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc5 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc6 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc7 [2024-09-01 16:07:05,585][26021] Worker 5 uses CPU cores [1] [2024-09-01 16:07:05,609][26019] Worker 3 uses CPU cores [1] [2024-09-01 16:07:05,647][26016] Worker 0 uses CPU cores [0] [2024-09-01 16:07:05,921][26018] Worker 1 uses CPU cores [1] [2024-09-01 16:07:05,924][26020] Worker 4 uses CPU cores [0] [2024-09-01 16:07:05,960][26002] Starting seed is not provided [2024-09-01 16:07:05,961][26002] Initializing actor-critic model on device cpu [2024-09-01 16:07:05,961][26002] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:07:05,963][26002] RunningMeanStd input shape: (1,) [2024-09-01 16:07:06,027][26022] Worker 6 uses CPU cores [0] [2024-09-01 16:07:06,034][26002] ConvEncoder: input_channels=3 [2024-09-01 16:07:06,109][26023] Worker 7 uses CPU cores [1] [2024-09-01 16:07:06,119][26017] Worker 2 uses CPU cores [0] [2024-09-01 16:07:06,249][26002] Conv encoder output size: 512 [2024-09-01 16:07:06,250][26002] Policy head output size: 512 [2024-09-01 16:07:06,267][26002] Created Actor Critic model with architecture: [2024-09-01 16:07:06,267][26002] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 16:07:06,769][26002] Using optimizer [2024-09-01 16:07:06,771][26002] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-01 16:07:06,812][26002] Loading model from checkpoint [2024-09-01 16:07:06,841][26002] Loaded experiment state at self.train_step=981, self.env_steps=4018176 [2024-09-01 16:07:06,842][26002] Initialized policy 0 weights for model version 981 [2024-09-01 16:07:06,844][26002] LearnerWorker_p0 finished initialization! [2024-09-01 16:07:06,849][26015] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:07:06,850][26015] RunningMeanStd input shape: (1,) [2024-09-01 16:07:06,874][26015] ConvEncoder: input_channels=3 [2024-09-01 16:07:07,027][26015] Conv encoder output size: 512 [2024-09-01 16:07:07,028][26015] Policy head output size: 512 [2024-09-01 16:07:07,050][00194] Inference worker 0-0 is ready! [2024-09-01 16:07:07,052][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 16:07:07,187][26023] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,190][26019] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,193][26021] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,203][26018] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,227][26022] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,224][26016] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,252][26017] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,258][26020] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:08,177][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4018176. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:08,257][26022] Decorrelating experience for 0 frames... [2024-09-01 16:07:08,273][26020] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,124][26023] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,130][26021] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,129][26019] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,141][26018] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,195][26022] Decorrelating experience for 32 frames... [2024-09-01 16:07:09,216][26020] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,447][00194] Heartbeat connected on Batcher_0 [2024-09-01 16:07:10,453][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 16:07:10,493][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 16:07:10,542][26023] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,545][26018] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,607][26017] Decorrelating experience for 0 frames... [2024-09-01 16:07:10,649][26016] Decorrelating experience for 0 frames... [2024-09-01 16:07:10,787][26019] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,869][26020] Decorrelating experience for 64 frames... [2024-09-01 16:07:11,744][26021] Decorrelating experience for 32 frames... [2024-09-01 16:07:11,852][26018] Decorrelating experience for 64 frames... [2024-09-01 16:07:12,576][26016] Decorrelating experience for 32 frames... [2024-09-01 16:07:12,593][26017] Decorrelating experience for 32 frames... [2024-09-01 16:07:12,868][26022] Decorrelating experience for 64 frames... [2024-09-01 16:07:13,171][26020] Decorrelating experience for 96 frames... [2024-09-01 16:07:13,178][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4018176. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:13,611][26021] Decorrelating experience for 64 frames... [2024-09-01 16:07:13,699][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 16:07:13,834][26018] Decorrelating experience for 96 frames... [2024-09-01 16:07:14,414][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 16:07:15,125][26016] Decorrelating experience for 64 frames... [2024-09-01 16:07:15,224][26023] Decorrelating experience for 64 frames... [2024-09-01 16:07:17,374][26021] Decorrelating experience for 96 frames... [2024-09-01 16:07:17,847][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 16:07:18,177][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4018176. Throughput: 0: 40.2. Samples: 402. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:18,184][00194] Avg episode reward: [(0, '3.420')] [2024-09-01 16:07:18,806][26022] Decorrelating experience for 96 frames... [2024-09-01 16:07:18,892][26017] Decorrelating experience for 64 frames... [2024-09-01 16:07:19,271][26016] Decorrelating experience for 96 frames... [2024-09-01 16:07:19,611][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 16:07:20,302][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 16:07:22,819][26019] Decorrelating experience for 64 frames... [2024-09-01 16:07:23,177][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4018176. Throughput: 0: 108.4. Samples: 1626. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:23,179][00194] Avg episode reward: [(0, '4.904')] [2024-09-01 16:07:23,689][26017] Decorrelating experience for 96 frames... [2024-09-01 16:07:24,318][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 16:07:24,681][26002] Signal inference workers to stop experience collection... [2024-09-01 16:07:24,723][26015] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 16:07:25,227][26023] Decorrelating experience for 96 frames... [2024-09-01 16:07:25,403][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 16:07:25,464][26019] Decorrelating experience for 96 frames... [2024-09-01 16:07:25,565][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 16:07:25,848][26002] Signal inference workers to resume experience collection... [2024-09-01 16:07:25,849][26015] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 16:07:28,177][00194] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4022272. Throughput: 0: 164.8. Samples: 3296. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 16:07:28,185][00194] Avg episode reward: [(0, '4.277')] [2024-09-01 16:07:33,179][00194] Fps is (10 sec: 819.0, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 4026368. Throughput: 0: 149.4. Samples: 3736. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 16:07:33,183][00194] Avg episode reward: [(0, '7.907')] [2024-09-01 16:07:38,179][00194] Fps is (10 sec: 819.1, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4030464. Throughput: 0: 148.7. Samples: 4460. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:07:38,188][00194] Avg episode reward: [(0, '8.065')] [2024-09-01 16:07:43,177][00194] Fps is (10 sec: 819.4, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 4034560. Throughput: 0: 166.7. Samples: 5836. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:07:43,184][00194] Avg episode reward: [(0, '8.912')] [2024-09-01 16:07:48,177][00194] Fps is (10 sec: 819.3, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 4038656. Throughput: 0: 163.5. Samples: 6540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:07:48,180][00194] Avg episode reward: [(0, '9.919')] [2024-09-01 16:07:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 4042752. Throughput: 0: 184.8. Samples: 8314. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:07:53,180][00194] Avg episode reward: [(0, '10.831')] [2024-09-01 16:07:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 4046848. Throughput: 0: 203.1. Samples: 9140. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:07:58,183][00194] Avg episode reward: [(0, '11.359')] [2024-09-01 16:08:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 595.8, 300 sec: 595.8). Total num frames: 4050944. Throughput: 0: 216.0. Samples: 10124. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:08:03,185][00194] Avg episode reward: [(0, '12.138')] [2024-09-01 16:08:07,361][26015] Updated weights for policy 0, policy_version 991 (0.1120) [2024-09-01 16:08:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 4059136. Throughput: 0: 223.2. Samples: 11668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:08,183][00194] Avg episode reward: [(0, '12.743')] [2024-09-01 16:08:13,182][00194] Fps is (10 sec: 1228.2, 60 sec: 750.9, 300 sec: 693.1). Total num frames: 4063232. Throughput: 0: 203.1. Samples: 12436. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:13,193][00194] Avg episode reward: [(0, '13.246')] [2024-09-01 16:08:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 4067328. Throughput: 0: 216.7. Samples: 13488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:18,185][00194] Avg episode reward: [(0, '13.947')] [2024-09-01 16:08:23,179][00194] Fps is (10 sec: 819.5, 60 sec: 887.4, 300 sec: 710.0). Total num frames: 4071424. Throughput: 0: 230.4. Samples: 14830. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:23,181][00194] Avg episode reward: [(0, '14.673')] [2024-09-01 16:08:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 716.8). Total num frames: 4075520. Throughput: 0: 240.9. Samples: 16678. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:28,180][00194] Avg episode reward: [(0, '14.769')] [2024-09-01 16:08:33,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 722.8). Total num frames: 4079616. Throughput: 0: 229.7. Samples: 16876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:33,191][00194] Avg episode reward: [(0, '14.831')] [2024-09-01 16:08:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 4083712. Throughput: 0: 228.4. Samples: 18592. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:38,180][00194] Avg episode reward: [(0, '15.019')] [2024-09-01 16:08:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 776.1). Total num frames: 4091904. Throughput: 0: 224.6. Samples: 19246. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:43,179][00194] Avg episode reward: [(0, '15.753')] [2024-09-01 16:08:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 778.2). Total num frames: 4096000. Throughput: 0: 235.2. Samples: 20710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:08:48,181][00194] Avg episode reward: [(0, '15.722')] [2024-09-01 16:08:53,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 741.2). Total num frames: 4096000. Throughput: 0: 227.3. Samples: 21896. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:08:53,184][00194] Avg episode reward: [(0, '15.941')] [2024-09-01 16:08:53,307][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001001_4100096.pth... [2024-09-01 16:08:53,313][26015] Updated weights for policy 0, policy_version 1001 (0.2141) [2024-09-01 16:08:53,423][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth [2024-09-01 16:08:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 782.0). Total num frames: 4104192. Throughput: 0: 239.6. Samples: 23218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:08:58,179][00194] Avg episode reward: [(0, '16.570')] [2024-09-01 16:09:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 783.6). Total num frames: 4108288. Throughput: 0: 236.9. Samples: 24148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:03,180][00194] Avg episode reward: [(0, '17.257')] [2024-09-01 16:09:08,181][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 785.0). Total num frames: 4112384. Throughput: 0: 230.3. Samples: 25192. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:08,184][00194] Avg episode reward: [(0, '17.191')] [2024-09-01 16:09:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 786.4). Total num frames: 4116480. Throughput: 0: 222.3. Samples: 26682. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:13,179][00194] Avg episode reward: [(0, '18.269')] [2024-09-01 16:09:18,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 787.7). Total num frames: 4120576. Throughput: 0: 237.4. Samples: 27560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:18,183][00194] Avg episode reward: [(0, '19.897')] [2024-09-01 16:09:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 788.9). Total num frames: 4124672. Throughput: 0: 232.3. Samples: 29044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:23,180][00194] Avg episode reward: [(0, '19.967')] [2024-09-01 16:09:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 789.9). Total num frames: 4128768. Throughput: 0: 244.5. Samples: 30250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:28,187][00194] Avg episode reward: [(0, '20.032')] [2024-09-01 16:09:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 4136960. Throughput: 0: 226.6. Samples: 30908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:33,180][00194] Avg episode reward: [(0, '21.086')] [2024-09-01 16:09:36,538][26015] Updated weights for policy 0, policy_version 1011 (0.2612) [2024-09-01 16:09:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 4141056. Throughput: 0: 235.8. Samples: 32508. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:38,182][00194] Avg episode reward: [(0, '21.683')] [2024-09-01 16:09:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4145152. Throughput: 0: 233.7. Samples: 33734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:09:43,180][00194] Avg episode reward: [(0, '21.870')] [2024-09-01 16:09:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4149248. Throughput: 0: 226.2. Samples: 34326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:09:48,180][00194] Avg episode reward: [(0, '21.780')] [2024-09-01 16:09:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 4153344. Throughput: 0: 236.4. Samples: 35830. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:53,180][00194] Avg episode reward: [(0, '22.352')] [2024-09-01 16:09:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4157440. Throughput: 0: 238.3. Samples: 37406. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:58,180][00194] Avg episode reward: [(0, '22.221')] [2024-09-01 16:10:03,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 4161536. Throughput: 0: 230.8. Samples: 37948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:10:03,197][00194] Avg episode reward: [(0, '22.833')] [2024-09-01 16:10:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4165632. Throughput: 0: 233.2. Samples: 39540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:10:08,186][00194] Avg episode reward: [(0, '22.833')] [2024-09-01 16:10:13,177][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 841.3). Total num frames: 4173824. Throughput: 0: 233.6. Samples: 40762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:10:13,185][00194] Avg episode reward: [(0, '22.382')] [2024-09-01 16:10:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 840.8). Total num frames: 4177920. Throughput: 0: 238.2. Samples: 41626. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:18,183][00194] Avg episode reward: [(0, '22.234')] [2024-09-01 16:10:22,418][26015] Updated weights for policy 0, policy_version 1021 (0.1004) [2024-09-01 16:10:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 840.2). Total num frames: 4182016. Throughput: 0: 225.6. Samples: 42662. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:23,179][00194] Avg episode reward: [(0, '22.592')] [2024-09-01 16:10:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 839.7). Total num frames: 4186112. Throughput: 0: 235.5. Samples: 44332. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:28,180][00194] Avg episode reward: [(0, '22.806')] [2024-09-01 16:10:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 4190208. Throughput: 0: 241.3. Samples: 45184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:33,180][00194] Avg episode reward: [(0, '23.284')] [2024-09-01 16:10:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.7). Total num frames: 4194304. Throughput: 0: 233.5. Samples: 46338. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:38,182][00194] Avg episode reward: [(0, '23.781')] [2024-09-01 16:10:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.3). Total num frames: 4198400. Throughput: 0: 228.7. Samples: 47696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:43,182][00194] Avg episode reward: [(0, '24.124')] [2024-09-01 16:10:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 856.4). Total num frames: 4206592. Throughput: 0: 237.8. Samples: 48648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:48,180][00194] Avg episode reward: [(0, '24.933')] [2024-09-01 16:10:52,079][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001028_4210688.pth... [2024-09-01 16:10:52,205][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth [2024-09-01 16:10:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 855.6). Total num frames: 4210688. Throughput: 0: 226.8. Samples: 49744. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:53,184][00194] Avg episode reward: [(0, '24.570')] [2024-09-01 16:10:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 854.8). Total num frames: 4214784. Throughput: 0: 224.5. Samples: 50866. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:58,180][00194] Avg episode reward: [(0, '23.878')] [2024-09-01 16:11:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 854.1). Total num frames: 4218880. Throughput: 0: 226.7. Samples: 51828. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:03,180][00194] Avg episode reward: [(0, '24.630')] [2024-09-01 16:11:05,855][26015] Updated weights for policy 0, policy_version 1031 (0.1467) [2024-09-01 16:11:08,178][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 853.3). Total num frames: 4222976. Throughput: 0: 238.5. Samples: 53394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:08,181][00194] Avg episode reward: [(0, '24.416')] [2024-09-01 16:11:08,999][26002] Signal inference workers to stop experience collection... (50 times) [2024-09-01 16:11:09,060][26015] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 16:11:10,186][26002] Signal inference workers to resume experience collection... (50 times) [2024-09-01 16:11:10,187][26015] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 16:11:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 852.6). Total num frames: 4227072. Throughput: 0: 223.8. Samples: 54402. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:13,184][00194] Avg episode reward: [(0, '23.874')] [2024-09-01 16:11:18,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 852.0). Total num frames: 4231168. Throughput: 0: 217.6. Samples: 54976. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:18,179][00194] Avg episode reward: [(0, '23.679')] [2024-09-01 16:11:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 851.3). Total num frames: 4235264. Throughput: 0: 236.1. Samples: 56962. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:11:23,185][00194] Avg episode reward: [(0, '24.322')] [2024-09-01 16:11:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.7). Total num frames: 4239360. Throughput: 0: 229.0. Samples: 58000. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:11:28,184][00194] Avg episode reward: [(0, '24.549')] [2024-09-01 16:11:33,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.1). Total num frames: 4243456. Throughput: 0: 220.7. Samples: 58580. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:11:33,189][00194] Avg episode reward: [(0, '24.854')] [2024-09-01 16:11:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 849.5). Total num frames: 4247552. Throughput: 0: 220.2. Samples: 59654. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:11:38,184][00194] Avg episode reward: [(0, '24.462')] [2024-09-01 16:11:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 849.0). Total num frames: 4251648. Throughput: 0: 222.4. Samples: 60874. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:43,184][00194] Avg episode reward: [(0, '24.454')] [2024-09-01 16:11:48,179][00194] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 848.4). Total num frames: 4255744. Throughput: 0: 214.0. Samples: 61460. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:48,183][00194] Avg episode reward: [(0, '24.499')] [2024-09-01 16:11:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.9). Total num frames: 4259840. Throughput: 0: 204.0. Samples: 62572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:53,180][00194] Avg episode reward: [(0, '24.757')] [2024-09-01 16:11:54,518][26015] Updated weights for policy 0, policy_version 1041 (0.1754) [2024-09-01 16:11:58,177][00194] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 847.4). Total num frames: 4263936. Throughput: 0: 215.5. Samples: 64100. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:58,182][00194] Avg episode reward: [(0, '24.680')] [2024-09-01 16:12:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 4272128. Throughput: 0: 225.3. Samples: 65116. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:03,180][00194] Avg episode reward: [(0, '25.100')] [2024-09-01 16:12:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4276224. Throughput: 0: 204.6. Samples: 66168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:08,182][00194] Avg episode reward: [(0, '25.220')] [2024-09-01 16:12:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4280320. Throughput: 0: 210.9. Samples: 67492. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:13,184][00194] Avg episode reward: [(0, '25.315')] [2024-09-01 16:12:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4284416. Throughput: 0: 213.9. Samples: 68206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:18,182][00194] Avg episode reward: [(0, '25.625')] [2024-09-01 16:12:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4288512. Throughput: 0: 225.8. Samples: 69814. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:23,180][00194] Avg episode reward: [(0, '25.851')] [2024-09-01 16:12:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4292608. Throughput: 0: 222.7. Samples: 70894. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:28,185][00194] Avg episode reward: [(0, '25.901')] [2024-09-01 16:12:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4296704. Throughput: 0: 222.2. Samples: 71458. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:33,180][00194] Avg episode reward: [(0, '26.142')] [2024-09-01 16:12:37,639][26002] Saving new best policy, reward=26.142! [2024-09-01 16:12:37,655][26015] Updated weights for policy 0, policy_version 1051 (0.1676) [2024-09-01 16:12:38,179][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4304896. Throughput: 0: 238.1. Samples: 73286. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:38,189][00194] Avg episode reward: [(0, '26.086')] [2024-09-01 16:12:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4304896. Throughput: 0: 226.9. Samples: 74310. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:43,184][00194] Avg episode reward: [(0, '25.556')] [2024-09-01 16:12:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 4313088. Throughput: 0: 218.8. Samples: 74962. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:48,189][00194] Avg episode reward: [(0, '25.942')] [2024-09-01 16:12:51,983][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001054_4317184.pth... [2024-09-01 16:12:52,091][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001001_4100096.pth [2024-09-01 16:12:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4317184. Throughput: 0: 228.3. Samples: 76440. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:53,180][00194] Avg episode reward: [(0, '25.123')] [2024-09-01 16:12:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4321280. Throughput: 0: 232.6. Samples: 77960. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:58,183][00194] Avg episode reward: [(0, '25.533')] [2024-09-01 16:13:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4325376. Throughput: 0: 226.7. Samples: 78406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:03,179][00194] Avg episode reward: [(0, '25.730')] [2024-09-01 16:13:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4329472. Throughput: 0: 222.8. Samples: 79842. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:08,185][00194] Avg episode reward: [(0, '26.175')] [2024-09-01 16:13:09,985][26002] Saving new best policy, reward=26.175! [2024-09-01 16:13:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4333568. Throughput: 0: 235.3. Samples: 81482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:13,182][00194] Avg episode reward: [(0, '25.694')] [2024-09-01 16:13:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4337664. Throughput: 0: 235.0. Samples: 82034. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:18,180][00194] Avg episode reward: [(0, '26.236')] [2024-09-01 16:13:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4341760. Throughput: 0: 218.6. Samples: 83124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:13:23,180][00194] Avg episode reward: [(0, '26.997')] [2024-09-01 16:13:24,045][26002] Saving new best policy, reward=26.236! [2024-09-01 16:13:24,052][26015] Updated weights for policy 0, policy_version 1061 (0.0564) [2024-09-01 16:13:27,854][26002] Saving new best policy, reward=26.997! [2024-09-01 16:13:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4349952. Throughput: 0: 228.0. Samples: 84570. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:28,180][00194] Avg episode reward: [(0, '26.932')] [2024-09-01 16:13:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4354048. Throughput: 0: 234.2. Samples: 85500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:33,179][00194] Avg episode reward: [(0, '27.021')] [2024-09-01 16:13:37,194][26002] Saving new best policy, reward=27.021! [2024-09-01 16:13:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4358144. Throughput: 0: 222.4. Samples: 86446. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:38,183][00194] Avg episode reward: [(0, '27.035')] [2024-09-01 16:13:41,904][26002] Saving new best policy, reward=27.035! [2024-09-01 16:13:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4362240. Throughput: 0: 221.6. Samples: 87930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:43,184][00194] Avg episode reward: [(0, '26.874')] [2024-09-01 16:13:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 4366336. Throughput: 0: 225.2. Samples: 88540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:13:48,179][00194] Avg episode reward: [(0, '26.624')] [2024-09-01 16:13:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4370432. Throughput: 0: 229.1. Samples: 90150. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:13:53,180][00194] Avg episode reward: [(0, '26.624')] [2024-09-01 16:13:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4374528. Throughput: 0: 217.7. Samples: 91278. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:13:58,184][00194] Avg episode reward: [(0, '26.628')] [2024-09-01 16:14:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4378624. Throughput: 0: 223.2. Samples: 92076. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:14:03,186][00194] Avg episode reward: [(0, '25.625')] [2024-09-01 16:14:07,921][26015] Updated weights for policy 0, policy_version 1071 (0.1072) [2024-09-01 16:14:08,178][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4386816. Throughput: 0: 233.1. Samples: 93612. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:14:08,185][00194] Avg episode reward: [(0, '25.526')] [2024-09-01 16:14:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4390912. Throughput: 0: 224.5. Samples: 94674. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:13,181][00194] Avg episode reward: [(0, '25.796')] [2024-09-01 16:14:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4395008. Throughput: 0: 219.4. Samples: 95372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:18,180][00194] Avg episode reward: [(0, '25.692')] [2024-09-01 16:14:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4399104. Throughput: 0: 228.6. Samples: 96732. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:23,182][00194] Avg episode reward: [(0, '25.448')] [2024-09-01 16:14:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4403200. Throughput: 0: 236.1. Samples: 98556. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:28,181][00194] Avg episode reward: [(0, '24.703')] [2024-09-01 16:14:33,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4407296. Throughput: 0: 229.3. Samples: 98858. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:33,184][00194] Avg episode reward: [(0, '24.815')] [2024-09-01 16:14:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4411392. Throughput: 0: 223.9. Samples: 100224. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:38,179][00194] Avg episode reward: [(0, '24.032')] [2024-09-01 16:14:43,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4415488. Throughput: 0: 235.2. Samples: 101864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:14:43,185][00194] Avg episode reward: [(0, '23.589')] [2024-09-01 16:14:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4419584. Throughput: 0: 230.4. Samples: 102444. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:14:48,180][00194] Avg episode reward: [(0, '23.996')] [2024-09-01 16:14:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4423680. Throughput: 0: 222.4. Samples: 103622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:14:53,185][00194] Avg episode reward: [(0, '23.741')] [2024-09-01 16:14:53,813][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001081_4427776.pth... [2024-09-01 16:14:53,817][26015] Updated weights for policy 0, policy_version 1081 (0.2107) [2024-09-01 16:14:53,927][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001028_4210688.pth [2024-09-01 16:14:56,082][26002] Signal inference workers to stop experience collection... (100 times) [2024-09-01 16:14:56,152][26015] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 16:14:57,565][26002] Signal inference workers to resume experience collection... (100 times) [2024-09-01 16:14:57,566][26015] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 16:14:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4431872. Throughput: 0: 231.9. Samples: 105108. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:14:58,182][00194] Avg episode reward: [(0, '24.017')] [2024-09-01 16:15:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4435968. Throughput: 0: 233.4. Samples: 105874. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:15:03,180][00194] Avg episode reward: [(0, '23.470')] [2024-09-01 16:15:08,185][00194] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4440064. Throughput: 0: 229.3. Samples: 107050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:08,198][00194] Avg episode reward: [(0, '22.411')] [2024-09-01 16:15:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4444160. Throughput: 0: 223.7. Samples: 108622. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:13,179][00194] Avg episode reward: [(0, '22.251')] [2024-09-01 16:15:18,177][00194] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4448256. Throughput: 0: 232.1. Samples: 109304. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:18,180][00194] Avg episode reward: [(0, '22.522')] [2024-09-01 16:15:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4452352. Throughput: 0: 233.8. Samples: 110744. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:23,180][00194] Avg episode reward: [(0, '22.716')] [2024-09-01 16:15:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4456448. Throughput: 0: 224.8. Samples: 111978. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:28,179][00194] Avg episode reward: [(0, '22.022')] [2024-09-01 16:15:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 4464640. Throughput: 0: 226.5. Samples: 112638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:33,183][00194] Avg episode reward: [(0, '22.062')] [2024-09-01 16:15:37,299][26015] Updated weights for policy 0, policy_version 1091 (0.1942) [2024-09-01 16:15:38,180][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4468736. Throughput: 0: 235.2. Samples: 114206. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:38,183][00194] Avg episode reward: [(0, '21.303')] [2024-09-01 16:15:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4472832. Throughput: 0: 225.9. Samples: 115272. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:43,180][00194] Avg episode reward: [(0, '21.581')] [2024-09-01 16:15:48,177][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4476928. Throughput: 0: 224.9. Samples: 115994. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:48,179][00194] Avg episode reward: [(0, '22.360')] [2024-09-01 16:15:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4481024. Throughput: 0: 237.6. Samples: 117742. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:53,181][00194] Avg episode reward: [(0, '21.657')] [2024-09-01 16:15:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4485120. Throughput: 0: 236.6. Samples: 119268. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:58,180][00194] Avg episode reward: [(0, '21.595')] [2024-09-01 16:16:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4489216. Throughput: 0: 229.3. Samples: 119624. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:16:03,183][00194] Avg episode reward: [(0, '22.423')] [2024-09-01 16:16:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 4493312. Throughput: 0: 234.3. Samples: 121288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:16:08,189][00194] Avg episode reward: [(0, '22.373')] [2024-09-01 16:16:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4497408. Throughput: 0: 221.8. Samples: 121960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:16:13,181][00194] Avg episode reward: [(0, '22.370')] [2024-09-01 16:16:18,180][00194] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4497408. Throughput: 0: 214.6. Samples: 122294. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:16:18,186][00194] Avg episode reward: [(0, '22.165')] [2024-09-01 16:16:23,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4501504. Throughput: 0: 196.3. Samples: 123040. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:16:23,183][00194] Avg episode reward: [(0, '22.188')] [2024-09-01 16:16:28,177][00194] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4505600. Throughput: 0: 202.1. Samples: 124366. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:16:28,182][00194] Avg episode reward: [(0, '22.167')] [2024-09-01 16:16:28,861][26015] Updated weights for policy 0, policy_version 1101 (0.2124) [2024-09-01 16:16:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4513792. Throughput: 0: 207.0. Samples: 125310. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:33,180][00194] Avg episode reward: [(0, '22.660')] [2024-09-01 16:16:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4517888. Throughput: 0: 193.6. Samples: 126456. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:38,184][00194] Avg episode reward: [(0, '22.608')] [2024-09-01 16:16:43,177][00194] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 4517888. Throughput: 0: 183.9. Samples: 127542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:43,183][00194] Avg episode reward: [(0, '21.767')] [2024-09-01 16:16:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4526080. Throughput: 0: 198.0. Samples: 128536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:48,184][00194] Avg episode reward: [(0, '21.312')] [2024-09-01 16:16:51,127][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001106_4530176.pth... [2024-09-01 16:16:51,247][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001054_4317184.pth [2024-09-01 16:16:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4530176. Throughput: 0: 193.5. Samples: 129996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:16:53,190][00194] Avg episode reward: [(0, '21.431')] [2024-09-01 16:16:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4534272. Throughput: 0: 199.9. Samples: 130954. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:16:58,182][00194] Avg episode reward: [(0, '21.104')] [2024-09-01 16:17:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4538368. Throughput: 0: 207.7. Samples: 131642. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:17:03,179][00194] Avg episode reward: [(0, '20.777')] [2024-09-01 16:17:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4542464. Throughput: 0: 229.0. Samples: 133346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:17:08,180][00194] Avg episode reward: [(0, '21.631')] [2024-09-01 16:17:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4546560. Throughput: 0: 224.0. Samples: 134448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:17:13,186][00194] Avg episode reward: [(0, '21.594')] [2024-09-01 16:17:14,987][26015] Updated weights for policy 0, policy_version 1111 (0.1004) [2024-09-01 16:17:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4550656. Throughput: 0: 213.6. Samples: 134920. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:17:18,185][00194] Avg episode reward: [(0, '21.538')] [2024-09-01 16:17:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4554752. Throughput: 0: 224.7. Samples: 136568. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:17:23,185][00194] Avg episode reward: [(0, '22.171')] [2024-09-01 16:17:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4562944. Throughput: 0: 231.0. Samples: 137936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:17:28,179][00194] Avg episode reward: [(0, '22.637')] [2024-09-01 16:17:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 4562944. Throughput: 0: 225.6. Samples: 138686. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:17:33,185][00194] Avg episode reward: [(0, '22.898')] [2024-09-01 16:17:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4571136. Throughput: 0: 216.4. Samples: 139736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:38,180][00194] Avg episode reward: [(0, '23.147')] [2024-09-01 16:17:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4575232. Throughput: 0: 225.0. Samples: 141078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:43,180][00194] Avg episode reward: [(0, '23.221')] [2024-09-01 16:17:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4579328. Throughput: 0: 229.0. Samples: 141946. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:48,184][00194] Avg episode reward: [(0, '23.405')] [2024-09-01 16:17:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4583424. Throughput: 0: 215.0. Samples: 143022. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:53,182][00194] Avg episode reward: [(0, '23.405')] [2024-09-01 16:17:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4587520. Throughput: 0: 226.2. Samples: 144626. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:17:58,186][00194] Avg episode reward: [(0, '23.392')] [2024-09-01 16:18:00,196][26015] Updated weights for policy 0, policy_version 1121 (0.0537) [2024-09-01 16:18:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4591616. Throughput: 0: 231.2. Samples: 145324. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:03,186][00194] Avg episode reward: [(0, '23.385')] [2024-09-01 16:18:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4595712. Throughput: 0: 225.6. Samples: 146718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:18:08,180][00194] Avg episode reward: [(0, '23.552')] [2024-09-01 16:18:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4599808. Throughput: 0: 218.5. Samples: 147768. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:18:13,182][00194] Avg episode reward: [(0, '23.662')] [2024-09-01 16:18:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4608000. Throughput: 0: 222.4. Samples: 148692. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:18:18,189][00194] Avg episode reward: [(0, '23.160')] [2024-09-01 16:18:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4612096. Throughput: 0: 229.2. Samples: 150052. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:23,180][00194] Avg episode reward: [(0, '23.275')] [2024-09-01 16:18:28,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4616192. Throughput: 0: 223.0. Samples: 151112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:28,184][00194] Avg episode reward: [(0, '23.210')] [2024-09-01 16:18:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4620288. Throughput: 0: 223.3. Samples: 151996. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:33,180][00194] Avg episode reward: [(0, '23.776')] [2024-09-01 16:18:38,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4624384. Throughput: 0: 226.7. Samples: 153224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:38,179][00194] Avg episode reward: [(0, '23.531')] [2024-09-01 16:18:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4628480. Throughput: 0: 226.6. Samples: 154824. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:43,185][00194] Avg episode reward: [(0, '24.070')] [2024-09-01 16:18:45,617][26015] Updated weights for policy 0, policy_version 1131 (0.1686) [2024-09-01 16:18:48,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4632576. Throughput: 0: 218.0. Samples: 155136. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:48,182][00194] Avg episode reward: [(0, '25.152')] [2024-09-01 16:18:49,159][26002] Signal inference workers to stop experience collection... (150 times) [2024-09-01 16:18:49,251][26015] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 16:18:50,348][26002] Signal inference workers to resume experience collection... (150 times) [2024-09-01 16:18:50,349][26015] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 16:18:50,362][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001132_4636672.pth... [2024-09-01 16:18:50,482][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001081_4427776.pth [2024-09-01 16:18:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4636672. Throughput: 0: 222.9. Samples: 156748. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:53,179][00194] Avg episode reward: [(0, '25.359')] [2024-09-01 16:18:58,177][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4644864. Throughput: 0: 229.2. Samples: 158082. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:18:58,185][00194] Avg episode reward: [(0, '25.478')] [2024-09-01 16:19:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4644864. Throughput: 0: 225.7. Samples: 158848. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:19:03,185][00194] Avg episode reward: [(0, '25.145')] [2024-09-01 16:19:08,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4648960. Throughput: 0: 221.0. Samples: 159998. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:19:08,179][00194] Avg episode reward: [(0, '24.893')] [2024-09-01 16:19:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4657152. Throughput: 0: 229.8. Samples: 161454. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:13,180][00194] Avg episode reward: [(0, '24.973')] [2024-09-01 16:19:18,180][00194] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4661248. Throughput: 0: 227.1. Samples: 162216. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:18,190][00194] Avg episode reward: [(0, '24.829')] [2024-09-01 16:19:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4665344. Throughput: 0: 223.3. Samples: 163272. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:23,185][00194] Avg episode reward: [(0, '23.987')] [2024-09-01 16:19:28,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4669440. Throughput: 0: 223.1. Samples: 164862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:28,180][00194] Avg episode reward: [(0, '23.948')] [2024-09-01 16:19:30,568][26015] Updated weights for policy 0, policy_version 1141 (0.0525) [2024-09-01 16:19:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4673536. Throughput: 0: 230.9. Samples: 165524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:33,188][00194] Avg episode reward: [(0, '23.924')] [2024-09-01 16:19:38,180][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4677632. Throughput: 0: 224.2. Samples: 166836. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:38,185][00194] Avg episode reward: [(0, '23.835')] [2024-09-01 16:19:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4681728. Throughput: 0: 222.2. Samples: 168080. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:43,180][00194] Avg episode reward: [(0, '23.820')] [2024-09-01 16:19:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4685824. Throughput: 0: 223.5. Samples: 168904. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:48,180][00194] Avg episode reward: [(0, '25.249')] [2024-09-01 16:19:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4694016. Throughput: 0: 230.3. Samples: 170362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:53,183][00194] Avg episode reward: [(0, '25.912')] [2024-09-01 16:19:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 4694016. Throughput: 0: 220.9. Samples: 171394. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:58,182][00194] Avg episode reward: [(0, '25.991')] [2024-09-01 16:20:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4702208. Throughput: 0: 219.8. Samples: 172108. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:03,186][00194] Avg episode reward: [(0, '26.219')] [2024-09-01 16:20:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4706304. Throughput: 0: 226.5. Samples: 173464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:08,180][00194] Avg episode reward: [(0, '25.631')] [2024-09-01 16:20:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4710400. Throughput: 0: 227.4. Samples: 175094. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:13,180][00194] Avg episode reward: [(0, '26.296')] [2024-09-01 16:20:16,548][26015] Updated weights for policy 0, policy_version 1151 (0.1530) [2024-09-01 16:20:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4714496. Throughput: 0: 222.3. Samples: 175526. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:18,183][00194] Avg episode reward: [(0, '26.681')] [2024-09-01 16:20:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4718592. Throughput: 0: 224.0. Samples: 176914. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:23,181][00194] Avg episode reward: [(0, '26.178')] [2024-09-01 16:20:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4722688. Throughput: 0: 230.8. Samples: 178464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:28,180][00194] Avg episode reward: [(0, '26.088')] [2024-09-01 16:20:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4726784. Throughput: 0: 228.2. Samples: 179174. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:33,180][00194] Avg episode reward: [(0, '25.861')] [2024-09-01 16:20:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4730880. Throughput: 0: 217.6. Samples: 180152. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:38,180][00194] Avg episode reward: [(0, '26.192')] [2024-09-01 16:20:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4739072. Throughput: 0: 231.6. Samples: 181818. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:20:43,179][00194] Avg episode reward: [(0, '27.073')] [2024-09-01 16:20:46,499][26002] Saving new best policy, reward=27.073! [2024-09-01 16:20:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4743168. Throughput: 0: 235.6. Samples: 182710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:48,184][00194] Avg episode reward: [(0, '26.905')] [2024-09-01 16:20:52,211][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001159_4747264.pth... [2024-09-01 16:20:52,288][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001106_4530176.pth [2024-09-01 16:20:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4747264. Throughput: 0: 226.1. Samples: 183638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:53,188][00194] Avg episode reward: [(0, '26.621')] [2024-09-01 16:20:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4751360. Throughput: 0: 222.8. Samples: 185118. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:58,185][00194] Avg episode reward: [(0, '26.404')] [2024-09-01 16:21:00,777][26015] Updated weights for policy 0, policy_version 1161 (0.0669) [2024-09-01 16:21:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4755456. Throughput: 0: 228.1. Samples: 185790. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:03,198][00194] Avg episode reward: [(0, '26.060')] [2024-09-01 16:21:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4759552. Throughput: 0: 228.3. Samples: 187186. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:21:08,184][00194] Avg episode reward: [(0, '26.880')] [2024-09-01 16:21:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4763648. Throughput: 0: 220.7. Samples: 188396. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:21:13,181][00194] Avg episode reward: [(0, '27.494')] [2024-09-01 16:21:15,305][26002] Saving new best policy, reward=27.494! [2024-09-01 16:21:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4767744. Throughput: 0: 222.0. Samples: 189162. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:21:18,189][00194] Avg episode reward: [(0, '27.752')] [2024-09-01 16:21:23,044][26002] Saving new best policy, reward=27.752! [2024-09-01 16:21:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4775936. Throughput: 0: 238.4. Samples: 190880. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:23,180][00194] Avg episode reward: [(0, '28.043')] [2024-09-01 16:21:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4775936. Throughput: 0: 223.1. Samples: 191856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:28,180][00194] Avg episode reward: [(0, '28.605')] [2024-09-01 16:21:28,766][26002] Saving new best policy, reward=28.043! [2024-09-01 16:21:33,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4780032. Throughput: 0: 215.9. Samples: 192426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:33,185][00194] Avg episode reward: [(0, '28.438')] [2024-09-01 16:21:33,496][26002] Saving new best policy, reward=28.605! [2024-09-01 16:21:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4788224. Throughput: 0: 227.5. Samples: 193876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:38,183][00194] Avg episode reward: [(0, '28.370')] [2024-09-01 16:21:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4792320. Throughput: 0: 218.1. Samples: 194932. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:43,184][00194] Avg episode reward: [(0, '28.777')] [2024-09-01 16:21:46,828][26002] Saving new best policy, reward=28.777! [2024-09-01 16:21:46,833][26015] Updated weights for policy 0, policy_version 1171 (0.0621) [2024-09-01 16:21:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4796416. Throughput: 0: 225.7. Samples: 195948. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:48,180][00194] Avg episode reward: [(0, '28.218')] [2024-09-01 16:21:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4800512. Throughput: 0: 221.9. Samples: 197170. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:53,184][00194] Avg episode reward: [(0, '27.670')] [2024-09-01 16:21:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4804608. Throughput: 0: 237.2. Samples: 199070. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:58,179][00194] Avg episode reward: [(0, '27.562')] [2024-09-01 16:22:03,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4808704. Throughput: 0: 225.7. Samples: 199320. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:03,182][00194] Avg episode reward: [(0, '27.447')] [2024-09-01 16:22:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4812800. Throughput: 0: 214.6. Samples: 200536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:08,179][00194] Avg episode reward: [(0, '27.085')] [2024-09-01 16:22:13,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4816896. Throughput: 0: 229.4. Samples: 202180. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:13,186][00194] Avg episode reward: [(0, '27.707')] [2024-09-01 16:22:18,180][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4825088. Throughput: 0: 235.8. Samples: 203036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:18,183][00194] Avg episode reward: [(0, '27.036')] [2024-09-01 16:22:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4825088. Throughput: 0: 232.8. Samples: 204354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:23,180][00194] Avg episode reward: [(0, '27.002')] [2024-09-01 16:22:28,177][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4833280. Throughput: 0: 233.3. Samples: 205430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:22:28,186][00194] Avg episode reward: [(0, '27.165')] [2024-09-01 16:22:31,927][26015] Updated weights for policy 0, policy_version 1181 (0.1670) [2024-09-01 16:22:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4837376. Throughput: 0: 233.2. Samples: 206442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:22:33,180][00194] Avg episode reward: [(0, '27.113')] [2024-09-01 16:22:34,243][26002] Signal inference workers to stop experience collection... (200 times) [2024-09-01 16:22:34,308][26015] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 16:22:35,711][26002] Signal inference workers to resume experience collection... (200 times) [2024-09-01 16:22:35,713][26015] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 16:22:38,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4841472. Throughput: 0: 232.7. Samples: 207642. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:38,190][00194] Avg episode reward: [(0, '26.311')] [2024-09-01 16:22:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4845568. Throughput: 0: 217.6. Samples: 208860. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:43,180][00194] Avg episode reward: [(0, '25.657')] [2024-09-01 16:22:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4849664. Throughput: 0: 225.9. Samples: 209486. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:48,181][00194] Avg episode reward: [(0, '25.622')] [2024-09-01 16:22:50,066][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001185_4853760.pth... [2024-09-01 16:22:50,182][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001132_4636672.pth [2024-09-01 16:22:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4853760. Throughput: 0: 240.2. Samples: 211344. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:53,185][00194] Avg episode reward: [(0, '25.926')] [2024-09-01 16:22:58,180][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4857856. Throughput: 0: 226.4. Samples: 212368. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:22:58,188][00194] Avg episode reward: [(0, '25.848')] [2024-09-01 16:23:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4861952. Throughput: 0: 219.8. Samples: 212928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:23:03,180][00194] Avg episode reward: [(0, '25.533')] [2024-09-01 16:23:08,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4866048. Throughput: 0: 228.4. Samples: 214634. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:23:08,184][00194] Avg episode reward: [(0, '24.453')] [2024-09-01 16:23:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4874240. Throughput: 0: 234.2. Samples: 215968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:23:13,181][00194] Avg episode reward: [(0, '23.632')] [2024-09-01 16:23:17,650][26015] Updated weights for policy 0, policy_version 1191 (0.1826) [2024-09-01 16:23:18,177][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4878336. Throughput: 0: 226.5. Samples: 216634. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:23:18,184][00194] Avg episode reward: [(0, '23.473')] [2024-09-01 16:23:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4882432. Throughput: 0: 221.3. Samples: 217600. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:23:23,180][00194] Avg episode reward: [(0, '24.007')] [2024-09-01 16:23:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4886528. Throughput: 0: 235.7. Samples: 219466. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:23:28,180][00194] Avg episode reward: [(0, '24.305')] [2024-09-01 16:23:33,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4890624. Throughput: 0: 229.4. Samples: 219810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:33,182][00194] Avg episode reward: [(0, '24.018')] [2024-09-01 16:23:38,180][00194] Fps is (10 sec: 818.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4894720. Throughput: 0: 215.8. Samples: 221054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:38,186][00194] Avg episode reward: [(0, '24.131')] [2024-09-01 16:23:43,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4898816. Throughput: 0: 229.1. Samples: 222678. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:43,179][00194] Avg episode reward: [(0, '23.711')] [2024-09-01 16:23:48,177][00194] Fps is (10 sec: 1229.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4907008. Throughput: 0: 232.4. Samples: 223386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:48,180][00194] Avg episode reward: [(0, '24.096')] [2024-09-01 16:23:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4907008. Throughput: 0: 224.2. Samples: 224722. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:53,180][00194] Avg episode reward: [(0, '24.100')] [2024-09-01 16:23:58,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4911104. Throughput: 0: 217.7. Samples: 225764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:58,185][00194] Avg episode reward: [(0, '24.015')] [2024-09-01 16:24:02,535][26015] Updated weights for policy 0, policy_version 1201 (0.2548) [2024-09-01 16:24:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4919296. Throughput: 0: 224.9. Samples: 226756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:03,180][00194] Avg episode reward: [(0, '24.153')] [2024-09-01 16:24:08,179][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4923392. Throughput: 0: 235.8. Samples: 228210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:08,186][00194] Avg episode reward: [(0, '24.533')] [2024-09-01 16:24:13,182][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4927488. Throughput: 0: 216.9. Samples: 229228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:13,190][00194] Avg episode reward: [(0, '24.135')] [2024-09-01 16:24:18,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4931584. Throughput: 0: 223.7. Samples: 229878. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:18,180][00194] Avg episode reward: [(0, '24.265')] [2024-09-01 16:24:23,177][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4935680. Throughput: 0: 237.3. Samples: 231730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:23,180][00194] Avg episode reward: [(0, '25.074')] [2024-09-01 16:24:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4939776. Throughput: 0: 227.8. Samples: 232930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:28,180][00194] Avg episode reward: [(0, '25.369')] [2024-09-01 16:24:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4943872. Throughput: 0: 220.6. Samples: 233312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:33,186][00194] Avg episode reward: [(0, '25.850')] [2024-09-01 16:24:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 4952064. Throughput: 0: 227.4. Samples: 234956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:38,182][00194] Avg episode reward: [(0, '25.851')] [2024-09-01 16:24:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4956160. Throughput: 0: 235.2. Samples: 236346. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:43,184][00194] Avg episode reward: [(0, '26.430')] [2024-09-01 16:24:47,505][26015] Updated weights for policy 0, policy_version 1211 (0.0542) [2024-09-01 16:24:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4960256. Throughput: 0: 228.0. Samples: 237016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:48,180][00194] Avg episode reward: [(0, '26.794')] [2024-09-01 16:24:52,208][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001212_4964352.pth... [2024-09-01 16:24:52,323][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001159_4747264.pth [2024-09-01 16:24:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4964352. Throughput: 0: 217.7. Samples: 238006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:24:53,180][00194] Avg episode reward: [(0, '26.405')] [2024-09-01 16:24:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4968448. Throughput: 0: 232.1. Samples: 239670. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:24:58,180][00194] Avg episode reward: [(0, '25.465')] [2024-09-01 16:25:03,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4972544. Throughput: 0: 232.1. Samples: 240322. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:03,184][00194] Avg episode reward: [(0, '25.199')] [2024-09-01 16:25:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4976640. Throughput: 0: 214.8. Samples: 241394. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:08,186][00194] Avg episode reward: [(0, '25.529')] [2024-09-01 16:25:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4980736. Throughput: 0: 226.1. Samples: 243106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:13,179][00194] Avg episode reward: [(0, '25.556')] [2024-09-01 16:25:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4988928. Throughput: 0: 236.1. Samples: 243938. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:18,180][00194] Avg episode reward: [(0, '26.311')] [2024-09-01 16:25:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4988928. Throughput: 0: 224.8. Samples: 245072. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:23,180][00194] Avg episode reward: [(0, '25.548')] [2024-09-01 16:25:28,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4993024. Throughput: 0: 219.0. Samples: 246200. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:28,179][00194] Avg episode reward: [(0, '25.154')] [2024-09-01 16:25:32,397][26015] Updated weights for policy 0, policy_version 1221 (0.1037) [2024-09-01 16:25:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5001216. Throughput: 0: 223.4. Samples: 247070. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:33,180][00194] Avg episode reward: [(0, '25.424')] [2024-09-01 16:25:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5005312. Throughput: 0: 228.9. Samples: 248306. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:38,181][00194] Avg episode reward: [(0, '24.996')] [2024-09-01 16:25:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5009408. Throughput: 0: 219.6. Samples: 249550. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:43,182][00194] Avg episode reward: [(0, '25.339')] [2024-09-01 16:25:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5013504. Throughput: 0: 220.8. Samples: 250260. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:48,185][00194] Avg episode reward: [(0, '25.125')] [2024-09-01 16:25:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5017600. Throughput: 0: 237.2. Samples: 252066. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:53,180][00194] Avg episode reward: [(0, '25.237')] [2024-09-01 16:25:58,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5021696. Throughput: 0: 227.4. Samples: 253340. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:58,180][00194] Avg episode reward: [(0, '25.443')] [2024-09-01 16:26:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5025792. Throughput: 0: 216.5. Samples: 253682. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:03,184][00194] Avg episode reward: [(0, '24.973')] [2024-09-01 16:26:08,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5029888. Throughput: 0: 229.0. Samples: 255378. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:08,185][00194] Avg episode reward: [(0, '25.326')] [2024-09-01 16:26:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5038080. Throughput: 0: 233.8. Samples: 256720. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:13,184][00194] Avg episode reward: [(0, '25.131')] [2024-09-01 16:26:17,755][26015] Updated weights for policy 0, policy_version 1231 (0.0622) [2024-09-01 16:26:18,178][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5042176. Throughput: 0: 230.3. Samples: 257432. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:18,182][00194] Avg episode reward: [(0, '24.910')] [2024-09-01 16:26:21,209][26002] Signal inference workers to stop experience collection... (250 times) [2024-09-01 16:26:21,250][26015] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 16:26:22,397][26002] Signal inference workers to resume experience collection... (250 times) [2024-09-01 16:26:22,398][26015] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 16:26:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5046272. Throughput: 0: 226.0. Samples: 258476. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:23,185][00194] Avg episode reward: [(0, '24.996')] [2024-09-01 16:26:28,177][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5050368. Throughput: 0: 238.2. Samples: 260268. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:28,179][00194] Avg episode reward: [(0, '25.330')] [2024-09-01 16:26:33,181][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5054464. Throughput: 0: 234.2. Samples: 260800. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:33,188][00194] Avg episode reward: [(0, '24.743')] [2024-09-01 16:26:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5058560. Throughput: 0: 217.7. Samples: 261862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:38,180][00194] Avg episode reward: [(0, '24.999')] [2024-09-01 16:26:43,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5062656. Throughput: 0: 226.2. Samples: 263518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:43,181][00194] Avg episode reward: [(0, '24.556')] [2024-09-01 16:26:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5070848. Throughput: 0: 233.8. Samples: 264204. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:48,179][00194] Avg episode reward: [(0, '24.914')] [2024-09-01 16:26:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5070848. Throughput: 0: 227.9. Samples: 265632. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:53,180][00194] Avg episode reward: [(0, '24.642')] [2024-09-01 16:26:53,769][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth... [2024-09-01 16:26:53,847][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001185_4853760.pth [2024-09-01 16:26:58,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5074944. Throughput: 0: 220.5. Samples: 266642. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:58,185][00194] Avg episode reward: [(0, '25.300')] [2024-09-01 16:27:02,466][26015] Updated weights for policy 0, policy_version 1241 (0.1041) [2024-09-01 16:27:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5083136. Throughput: 0: 228.3. Samples: 267704. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:03,179][00194] Avg episode reward: [(0, '25.853')] [2024-09-01 16:27:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5087232. Throughput: 0: 228.9. Samples: 268778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:08,184][00194] Avg episode reward: [(0, '26.317')] [2024-09-01 16:27:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5091328. Throughput: 0: 215.9. Samples: 269982. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:13,180][00194] Avg episode reward: [(0, '26.383')] [2024-09-01 16:27:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5095424. Throughput: 0: 220.7. Samples: 270730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:18,179][00194] Avg episode reward: [(0, '26.575')] [2024-09-01 16:27:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5099520. Throughput: 0: 230.7. Samples: 272244. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:23,179][00194] Avg episode reward: [(0, '25.886')] [2024-09-01 16:27:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5103616. Throughput: 0: 211.2. Samples: 273020. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:28,181][00194] Avg episode reward: [(0, '26.109')] [2024-09-01 16:27:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5107712. Throughput: 0: 219.8. Samples: 274096. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:33,182][00194] Avg episode reward: [(0, '26.587')] [2024-09-01 16:27:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5111808. Throughput: 0: 225.7. Samples: 275788. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:38,179][00194] Avg episode reward: [(0, '25.520')] [2024-09-01 16:27:43,178][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5120000. Throughput: 0: 218.7. Samples: 276484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:27:43,182][00194] Avg episode reward: [(0, '26.484')] [2024-09-01 16:27:48,000][26015] Updated weights for policy 0, policy_version 1251 (0.1605) [2024-09-01 16:27:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5124096. Throughput: 0: 225.5. Samples: 277850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:27:48,184][00194] Avg episode reward: [(0, '26.844')] [2024-09-01 16:27:53,177][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5128192. Throughput: 0: 226.1. Samples: 278952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:27:53,180][00194] Avg episode reward: [(0, '27.357')] [2024-09-01 16:27:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5132288. Throughput: 0: 235.6. Samples: 280584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:27:58,180][00194] Avg episode reward: [(0, '27.600')] [2024-09-01 16:28:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5136384. Throughput: 0: 229.6. Samples: 281062. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:03,179][00194] Avg episode reward: [(0, '27.089')] [2024-09-01 16:28:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5140480. Throughput: 0: 223.0. Samples: 282278. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:08,185][00194] Avg episode reward: [(0, '26.846')] [2024-09-01 16:28:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5144576. Throughput: 0: 243.2. Samples: 283962. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:13,179][00194] Avg episode reward: [(0, '26.327')] [2024-09-01 16:28:18,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5148672. Throughput: 0: 235.3. Samples: 284686. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:18,187][00194] Avg episode reward: [(0, '26.687')] [2024-09-01 16:28:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5152768. Throughput: 0: 225.2. Samples: 285922. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:23,183][00194] Avg episode reward: [(0, '26.512')] [2024-09-01 16:28:28,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5156864. Throughput: 0: 237.1. Samples: 287152. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:28,179][00194] Avg episode reward: [(0, '26.852')] [2024-09-01 16:28:32,670][26015] Updated weights for policy 0, policy_version 1261 (0.1477) [2024-09-01 16:28:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5165056. Throughput: 0: 222.5. Samples: 287864. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:33,180][00194] Avg episode reward: [(0, '26.852')] [2024-09-01 16:28:38,180][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5169152. Throughput: 0: 227.6. Samples: 289194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:38,187][00194] Avg episode reward: [(0, '26.862')] [2024-09-01 16:28:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5173248. Throughput: 0: 220.3. Samples: 290496. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:43,184][00194] Avg episode reward: [(0, '27.232')] [2024-09-01 16:28:48,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5177344. Throughput: 0: 224.7. Samples: 291172. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:48,185][00194] Avg episode reward: [(0, '27.631')] [2024-09-01 16:28:50,752][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001265_5181440.pth... [2024-09-01 16:28:50,864][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001212_4964352.pth [2024-09-01 16:28:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5181440. Throughput: 0: 233.2. Samples: 292770. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:53,184][00194] Avg episode reward: [(0, '27.701')] [2024-09-01 16:28:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5185536. Throughput: 0: 228.1. Samples: 294226. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:28:58,180][00194] Avg episode reward: [(0, '27.442')] [2024-09-01 16:29:03,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5189632. Throughput: 0: 218.4. Samples: 294512. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:03,189][00194] Avg episode reward: [(0, '27.442')] [2024-09-01 16:29:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5193728. Throughput: 0: 223.6. Samples: 295986. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:08,179][00194] Avg episode reward: [(0, '28.054')] [2024-09-01 16:29:13,177][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5201920. Throughput: 0: 227.2. Samples: 297374. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:13,184][00194] Avg episode reward: [(0, '27.852')] [2024-09-01 16:29:17,923][26015] Updated weights for policy 0, policy_version 1271 (0.1976) [2024-09-01 16:29:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5206016. Throughput: 0: 228.8. Samples: 298162. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:18,181][00194] Avg episode reward: [(0, '28.579')] [2024-09-01 16:29:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5210112. Throughput: 0: 227.1. Samples: 299414. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:23,179][00194] Avg episode reward: [(0, '28.837')] [2024-09-01 16:29:26,920][26002] Saving new best policy, reward=28.837! [2024-09-01 16:29:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5214208. Throughput: 0: 232.1. Samples: 300942. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:28,180][00194] Avg episode reward: [(0, '29.127')] [2024-09-01 16:29:30,759][26002] Saving new best policy, reward=29.127! [2024-09-01 16:29:33,183][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5218304. Throughput: 0: 231.8. Samples: 301604. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:33,186][00194] Avg episode reward: [(0, '29.376')] [2024-09-01 16:29:36,180][26002] Saving new best policy, reward=29.376! [2024-09-01 16:29:38,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5222400. Throughput: 0: 218.9. Samples: 302620. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:38,183][00194] Avg episode reward: [(0, '29.563')] [2024-09-01 16:29:41,129][26002] Saving new best policy, reward=29.563! [2024-09-01 16:29:43,177][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5226496. Throughput: 0: 223.8. Samples: 304298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:29:43,185][00194] Avg episode reward: [(0, '28.905')] [2024-09-01 16:29:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5230592. Throughput: 0: 232.2. Samples: 304960. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:29:48,185][00194] Avg episode reward: [(0, '29.316')] [2024-09-01 16:29:53,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5234688. Throughput: 0: 227.9. Samples: 306240. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:53,184][00194] Avg episode reward: [(0, '29.042')] [2024-09-01 16:29:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5238784. Throughput: 0: 223.7. Samples: 307440. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:58,180][00194] Avg episode reward: [(0, '28.826')] [2024-09-01 16:30:03,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5242880. Throughput: 0: 223.8. Samples: 308232. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:30:03,184][00194] Avg episode reward: [(0, '28.151')] [2024-09-01 16:30:03,523][26015] Updated weights for policy 0, policy_version 1281 (0.0552) [2024-09-01 16:30:05,793][26002] Signal inference workers to stop experience collection... (300 times) [2024-09-01 16:30:05,836][26015] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 16:30:07,287][26002] Signal inference workers to resume experience collection... (300 times) [2024-09-01 16:30:07,289][26015] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 16:30:08,177][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5251072. Throughput: 0: 227.9. Samples: 309668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:08,180][00194] Avg episode reward: [(0, '28.279')] [2024-09-01 16:30:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5255168. Throughput: 0: 207.8. Samples: 310294. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:13,180][00194] Avg episode reward: [(0, '28.296')] [2024-09-01 16:30:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5259264. Throughput: 0: 217.1. Samples: 311372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:18,187][00194] Avg episode reward: [(0, '27.743')] [2024-09-01 16:30:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5263360. Throughput: 0: 231.5. Samples: 313036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:23,185][00194] Avg episode reward: [(0, '27.846')] [2024-09-01 16:30:28,180][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5267456. Throughput: 0: 224.3. Samples: 314390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:28,182][00194] Avg episode reward: [(0, '27.703')] [2024-09-01 16:30:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5271552. Throughput: 0: 220.7. Samples: 314890. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:33,185][00194] Avg episode reward: [(0, '27.890')] [2024-09-01 16:30:38,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5275648. Throughput: 0: 226.4. Samples: 316426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:38,180][00194] Avg episode reward: [(0, '27.052')] [2024-09-01 16:30:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5279744. Throughput: 0: 231.5. Samples: 317856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:43,186][00194] Avg episode reward: [(0, '26.846')] [2024-09-01 16:30:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5283840. Throughput: 0: 232.7. Samples: 318702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:48,179][00194] Avg episode reward: [(0, '26.407')] [2024-09-01 16:30:48,655][26015] Updated weights for policy 0, policy_version 1291 (0.2201) [2024-09-01 16:30:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5287936. Throughput: 0: 224.4. Samples: 319768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:53,186][00194] Avg episode reward: [(0, '26.771')] [2024-09-01 16:30:53,484][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_5292032.pth... [2024-09-01 16:30:53,591][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth [2024-09-01 16:30:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5296128. Throughput: 0: 241.4. Samples: 321156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:58,180][00194] Avg episode reward: [(0, '26.717')] [2024-09-01 16:31:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5300224. Throughput: 0: 234.9. Samples: 321944. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:03,179][00194] Avg episode reward: [(0, '26.465')] [2024-09-01 16:31:08,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5304320. Throughput: 0: 224.1. Samples: 323120. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:08,182][00194] Avg episode reward: [(0, '26.401')] [2024-09-01 16:31:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5308416. Throughput: 0: 229.1. Samples: 324698. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:13,182][00194] Avg episode reward: [(0, '26.210')] [2024-09-01 16:31:18,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5312512. Throughput: 0: 232.5. Samples: 325354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:18,185][00194] Avg episode reward: [(0, '25.831')] [2024-09-01 16:31:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5316608. Throughput: 0: 233.2. Samples: 326918. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:31:23,180][00194] Avg episode reward: [(0, '25.144')] [2024-09-01 16:31:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5320704. Throughput: 0: 224.7. Samples: 327966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:31:28,180][00194] Avg episode reward: [(0, '25.316')] [2024-09-01 16:31:33,076][26015] Updated weights for policy 0, policy_version 1301 (0.0550) [2024-09-01 16:31:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5328896. Throughput: 0: 220.6. Samples: 328630. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:33,179][00194] Avg episode reward: [(0, '25.962')] [2024-09-01 16:31:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5332992. Throughput: 0: 230.1. Samples: 330122. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:38,180][00194] Avg episode reward: [(0, '25.709')] [2024-09-01 16:31:43,181][00194] Fps is (10 sec: 818.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5337088. Throughput: 0: 221.4. Samples: 331122. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:43,191][00194] Avg episode reward: [(0, '25.805')] [2024-09-01 16:31:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5341184. Throughput: 0: 223.1. Samples: 331982. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:48,179][00194] Avg episode reward: [(0, '26.897')] [2024-09-01 16:31:53,177][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5345280. Throughput: 0: 233.8. Samples: 333642. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:53,186][00194] Avg episode reward: [(0, '27.249')] [2024-09-01 16:31:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5349376. Throughput: 0: 231.3. Samples: 335106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:58,184][00194] Avg episode reward: [(0, '26.684')] [2024-09-01 16:32:03,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5353472. Throughput: 0: 224.4. Samples: 335450. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:03,189][00194] Avg episode reward: [(0, '26.436')] [2024-09-01 16:32:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5357568. Throughput: 0: 223.6. Samples: 336978. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:08,180][00194] Avg episode reward: [(0, '26.042')] [2024-09-01 16:32:13,177][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5365760. Throughput: 0: 231.1. Samples: 338364. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:13,180][00194] Avg episode reward: [(0, '26.699')] [2024-09-01 16:32:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5365760. Throughput: 0: 234.7. Samples: 339192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:18,180][00194] Avg episode reward: [(0, '26.955')] [2024-09-01 16:32:18,587][26015] Updated weights for policy 0, policy_version 1311 (0.1637) [2024-09-01 16:32:23,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5369856. Throughput: 0: 227.7. Samples: 340368. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:23,186][00194] Avg episode reward: [(0, '26.630')] [2024-09-01 16:32:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5378048. Throughput: 0: 236.6. Samples: 341768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:28,183][00194] Avg episode reward: [(0, '26.073')] [2024-09-01 16:32:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5382144. Throughput: 0: 231.7. Samples: 342408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:33,180][00194] Avg episode reward: [(0, '25.973')] [2024-09-01 16:32:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5386240. Throughput: 0: 220.3. Samples: 343554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:38,180][00194] Avg episode reward: [(0, '25.922')] [2024-09-01 16:32:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5390336. Throughput: 0: 221.9. Samples: 345090. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:43,180][00194] Avg episode reward: [(0, '26.415')] [2024-09-01 16:32:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5394432. Throughput: 0: 224.7. Samples: 345560. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:48,180][00194] Avg episode reward: [(0, '25.910')] [2024-09-01 16:32:49,251][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001318_5398528.pth... [2024-09-01 16:32:49,362][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001265_5181440.pth [2024-09-01 16:32:53,180][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5398528. Throughput: 0: 227.6. Samples: 347220. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:53,185][00194] Avg episode reward: [(0, '25.335')] [2024-09-01 16:32:58,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5402624. Throughput: 0: 221.5. Samples: 348330. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:58,186][00194] Avg episode reward: [(0, '25.732')] [2024-09-01 16:33:03,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5406720. Throughput: 0: 217.0. Samples: 348956. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:03,183][00194] Avg episode reward: [(0, '25.498')] [2024-09-01 16:33:03,571][26015] Updated weights for policy 0, policy_version 1321 (0.1594) [2024-09-01 16:33:08,178][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5414912. Throughput: 0: 227.3. Samples: 350596. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:08,188][00194] Avg episode reward: [(0, '25.190')] [2024-09-01 16:33:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5419008. Throughput: 0: 219.7. Samples: 351656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:13,183][00194] Avg episode reward: [(0, '25.469')] [2024-09-01 16:33:18,177][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5423104. Throughput: 0: 223.4. Samples: 352462. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:18,187][00194] Avg episode reward: [(0, '26.090')] [2024-09-01 16:33:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5427200. Throughput: 0: 226.1. Samples: 353730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:23,180][00194] Avg episode reward: [(0, '26.379')] [2024-09-01 16:33:28,180][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5431296. Throughput: 0: 229.7. Samples: 355428. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:33:28,182][00194] Avg episode reward: [(0, '26.392')] [2024-09-01 16:33:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5435392. Throughput: 0: 227.3. Samples: 355790. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:33:33,185][00194] Avg episode reward: [(0, '25.307')] [2024-09-01 16:33:38,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5439488. Throughput: 0: 224.9. Samples: 357338. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:33:38,190][00194] Avg episode reward: [(0, '25.175')] [2024-09-01 16:33:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5443584. Throughput: 0: 233.1. Samples: 358820. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:33:43,186][00194] Avg episode reward: [(0, '24.816')] [2024-09-01 16:33:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5447680. Throughput: 0: 236.6. Samples: 359604. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:33:48,180][00194] Avg episode reward: [(0, '24.875')] [2024-09-01 16:33:48,923][26015] Updated weights for policy 0, policy_version 1331 (0.1040) [2024-09-01 16:33:52,544][26002] Signal inference workers to stop experience collection... (350 times) [2024-09-01 16:33:52,592][26015] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 16:33:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5451776. Throughput: 0: 224.4. Samples: 360694. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:33:53,186][00194] Avg episode reward: [(0, '24.882')] [2024-09-01 16:33:53,817][26002] Signal inference workers to resume experience collection... (350 times) [2024-09-01 16:33:53,818][26015] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 16:33:58,178][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5459968. Throughput: 0: 233.2. Samples: 362152. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:58,184][00194] Avg episode reward: [(0, '25.407')] [2024-09-01 16:34:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5464064. Throughput: 0: 232.4. Samples: 362920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:03,187][00194] Avg episode reward: [(0, '24.828')] [2024-09-01 16:34:08,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5468160. Throughput: 0: 227.6. Samples: 363972. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:08,182][00194] Avg episode reward: [(0, '24.905')] [2024-09-01 16:34:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5472256. Throughput: 0: 218.9. Samples: 365276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:34:13,179][00194] Avg episode reward: [(0, '25.595')] [2024-09-01 16:34:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5476352. Throughput: 0: 229.0. Samples: 366094. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:34:18,181][00194] Avg episode reward: [(0, '25.096')] [2024-09-01 16:34:23,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5480448. Throughput: 0: 229.7. Samples: 367674. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:23,182][00194] Avg episode reward: [(0, '24.877')] [2024-09-01 16:34:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5484544. Throughput: 0: 220.9. Samples: 368760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:28,179][00194] Avg episode reward: [(0, '25.541')] [2024-09-01 16:34:33,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5488640. Throughput: 0: 214.0. Samples: 369236. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:34:33,180][00194] Avg episode reward: [(0, '25.725')] [2024-09-01 16:34:33,851][26015] Updated weights for policy 0, policy_version 1341 (0.1523) [2024-09-01 16:34:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5496832. Throughput: 0: 229.5. Samples: 371022. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:34:38,180][00194] Avg episode reward: [(0, '25.720')] [2024-09-01 16:34:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5496832. Throughput: 0: 221.5. Samples: 372120. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:34:43,179][00194] Avg episode reward: [(0, '25.858')] [2024-09-01 16:34:48,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5500928. Throughput: 0: 219.4. Samples: 372794. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:34:48,180][00194] Avg episode reward: [(0, '26.018')] [2024-09-01 16:34:52,017][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth... [2024-09-01 16:34:52,131][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_5292032.pth [2024-09-01 16:34:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5509120. Throughput: 0: 227.4. Samples: 374206. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:34:53,180][00194] Avg episode reward: [(0, '26.609')] [2024-09-01 16:34:58,178][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5513216. Throughput: 0: 231.2. Samples: 375680. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:34:58,190][00194] Avg episode reward: [(0, '26.783')] [2024-09-01 16:35:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5517312. Throughput: 0: 226.6. Samples: 376292. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:03,183][00194] Avg episode reward: [(0, '26.419')] [2024-09-01 16:35:08,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5521408. Throughput: 0: 214.1. Samples: 377310. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:08,186][00194] Avg episode reward: [(0, '26.419')] [2024-09-01 16:35:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5525504. Throughput: 0: 234.5. Samples: 379312. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:13,183][00194] Avg episode reward: [(0, '27.281')] [2024-09-01 16:35:18,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5529600. Throughput: 0: 232.7. Samples: 379710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:18,184][00194] Avg episode reward: [(0, '26.924')] [2024-09-01 16:35:20,389][26015] Updated weights for policy 0, policy_version 1351 (0.0541) [2024-09-01 16:35:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5533696. Throughput: 0: 215.7. Samples: 380728. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:23,181][00194] Avg episode reward: [(0, '26.941')] [2024-09-01 16:35:28,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5537792. Throughput: 0: 226.6. Samples: 382318. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:28,186][00194] Avg episode reward: [(0, '26.888')] [2024-09-01 16:35:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5545984. Throughput: 0: 229.3. Samples: 383112. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:33,181][00194] Avg episode reward: [(0, '27.257')] [2024-09-01 16:35:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 5545984. Throughput: 0: 225.2. Samples: 384342. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:38,180][00194] Avg episode reward: [(0, '27.257')] [2024-09-01 16:35:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5554176. Throughput: 0: 217.3. Samples: 385458. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:43,179][00194] Avg episode reward: [(0, '27.668')] [2024-09-01 16:35:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5558272. Throughput: 0: 224.6. Samples: 386400. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:48,180][00194] Avg episode reward: [(0, '27.325')] [2024-09-01 16:35:53,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5558272. Throughput: 0: 225.4. Samples: 387454. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:53,185][00194] Avg episode reward: [(0, '27.561')] [2024-09-01 16:35:58,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5562368. Throughput: 0: 181.6. Samples: 387486. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:58,180][00194] Avg episode reward: [(0, '27.645')] [2024-09-01 16:36:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5566464. Throughput: 0: 193.9. Samples: 388434. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:36:03,184][00194] Avg episode reward: [(0, '27.329')] [2024-09-01 16:36:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5570560. Throughput: 0: 204.4. Samples: 389926. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:08,184][00194] Avg episode reward: [(0, '26.799')] [2024-09-01 16:36:10,656][26015] Updated weights for policy 0, policy_version 1361 (0.3310) [2024-09-01 16:36:13,178][00194] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5574656. Throughput: 0: 200.3. Samples: 391330. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:13,181][00194] Avg episode reward: [(0, '26.127')] [2024-09-01 16:36:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5578752. Throughput: 0: 191.1. Samples: 391712. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:18,181][00194] Avg episode reward: [(0, '26.861')] [2024-09-01 16:36:23,177][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5582848. Throughput: 0: 193.0. Samples: 393028. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:23,179][00194] Avg episode reward: [(0, '26.413')] [2024-09-01 16:36:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 5586944. Throughput: 0: 204.6. Samples: 394666. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:36:28,185][00194] Avg episode reward: [(0, '26.205')] [2024-09-01 16:36:33,180][00194] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 5591040. Throughput: 0: 203.0. Samples: 395536. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:36:33,183][00194] Avg episode reward: [(0, '26.133')] [2024-09-01 16:36:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.8). Total num frames: 5595136. Throughput: 0: 201.6. Samples: 396526. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:38,180][00194] Avg episode reward: [(0, '26.493')] [2024-09-01 16:36:43,177][00194] Fps is (10 sec: 1229.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5603328. Throughput: 0: 212.4. Samples: 397046. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:43,180][00194] Avg episode reward: [(0, '26.270')] [2024-09-01 16:36:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5607424. Throughput: 0: 229.1. Samples: 398742. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:48,184][00194] Avg episode reward: [(0, '26.885')] [2024-09-01 16:36:52,435][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001370_5611520.pth... [2024-09-01 16:36:52,584][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001318_5398528.pth [2024-09-01 16:36:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5611520. Throughput: 0: 219.0. Samples: 399782. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:53,184][00194] Avg episode reward: [(0, '27.118')] [2024-09-01 16:36:57,312][26015] Updated weights for policy 0, policy_version 1371 (0.2019) [2024-09-01 16:36:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5615616. Throughput: 0: 217.3. Samples: 401110. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:58,179][00194] Avg episode reward: [(0, '27.157')] [2024-09-01 16:37:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5619712. Throughput: 0: 224.1. Samples: 401796. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:03,185][00194] Avg episode reward: [(0, '27.084')] [2024-09-01 16:37:08,179][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 5623808. Throughput: 0: 224.3. Samples: 403122. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:08,182][00194] Avg episode reward: [(0, '26.167')] [2024-09-01 16:37:13,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5627904. Throughput: 0: 213.5. Samples: 404274. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:13,182][00194] Avg episode reward: [(0, '26.559')] [2024-09-01 16:37:18,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5632000. Throughput: 0: 211.1. Samples: 405036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:18,188][00194] Avg episode reward: [(0, '26.834')] [2024-09-01 16:37:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5636096. Throughput: 0: 227.5. Samples: 406764. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:23,185][00194] Avg episode reward: [(0, '26.512')] [2024-09-01 16:37:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5640192. Throughput: 0: 240.3. Samples: 407858. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:28,180][00194] Avg episode reward: [(0, '26.180')] [2024-09-01 16:37:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5644288. Throughput: 0: 214.3. Samples: 408384. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:33,182][00194] Avg episode reward: [(0, '26.672')] [2024-09-01 16:37:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5652480. Throughput: 0: 225.7. Samples: 409938. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:38,180][00194] Avg episode reward: [(0, '26.354')] [2024-09-01 16:37:41,426][26015] Updated weights for policy 0, policy_version 1381 (0.1659) [2024-09-01 16:37:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5656576. Throughput: 0: 216.3. Samples: 410844. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:43,179][00194] Avg episode reward: [(0, '25.707')] [2024-09-01 16:37:45,197][26002] Signal inference workers to stop experience collection... (400 times) [2024-09-01 16:37:45,287][26015] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-09-01 16:37:46,980][26002] Signal inference workers to resume experience collection... (400 times) [2024-09-01 16:37:46,982][26015] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-09-01 16:37:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5660672. Throughput: 0: 226.3. Samples: 411978. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:48,181][00194] Avg episode reward: [(0, '25.427')] [2024-09-01 16:37:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5664768. Throughput: 0: 223.8. Samples: 413194. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:53,180][00194] Avg episode reward: [(0, '25.146')] [2024-09-01 16:37:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5668864. Throughput: 0: 237.0. Samples: 414940. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:58,180][00194] Avg episode reward: [(0, '25.685')] [2024-09-01 16:38:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5672960. Throughput: 0: 232.2. Samples: 415484. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:38:03,180][00194] Avg episode reward: [(0, '26.396')] [2024-09-01 16:38:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5677056. Throughput: 0: 216.0. Samples: 416486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:08,180][00194] Avg episode reward: [(0, '25.876')] [2024-09-01 16:38:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5681152. Throughput: 0: 228.4. Samples: 418138. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:13,184][00194] Avg episode reward: [(0, '26.671')] [2024-09-01 16:38:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5689344. Throughput: 0: 236.4. Samples: 419024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:18,180][00194] Avg episode reward: [(0, '27.117')] [2024-09-01 16:38:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5689344. Throughput: 0: 224.5. Samples: 420042. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:23,180][00194] Avg episode reward: [(0, '27.637')] [2024-09-01 16:38:28,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5693440. Throughput: 0: 233.0. Samples: 421328. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:38:28,188][00194] Avg episode reward: [(0, '27.652')] [2024-09-01 16:38:28,341][26015] Updated weights for policy 0, policy_version 1391 (0.1704) [2024-09-01 16:38:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5701632. Throughput: 0: 229.3. Samples: 422298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:33,186][00194] Avg episode reward: [(0, '27.748')] [2024-09-01 16:38:38,178][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 5705728. Throughput: 0: 231.5. Samples: 423610. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:38,182][00194] Avg episode reward: [(0, '27.597')] [2024-09-01 16:38:43,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5709824. Throughput: 0: 208.8. Samples: 424336. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:43,183][00194] Avg episode reward: [(0, '27.989')] [2024-09-01 16:38:48,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5713920. Throughput: 0: 221.0. Samples: 425428. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:48,185][00194] Avg episode reward: [(0, '28.276')] [2024-09-01 16:38:50,276][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001396_5718016.pth... [2024-09-01 16:38:50,382][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth [2024-09-01 16:38:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5718016. Throughput: 0: 240.1. Samples: 427292. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:53,183][00194] Avg episode reward: [(0, '27.585')] [2024-09-01 16:38:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5722112. Throughput: 0: 229.6. Samples: 428470. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:58,180][00194] Avg episode reward: [(0, '27.747')] [2024-09-01 16:39:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5726208. Throughput: 0: 217.2. Samples: 428796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:39:03,182][00194] Avg episode reward: [(0, '27.350')] [2024-09-01 16:39:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5730304. Throughput: 0: 235.5. Samples: 430640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:39:08,180][00194] Avg episode reward: [(0, '26.992')] [2024-09-01 16:39:12,822][26015] Updated weights for policy 0, policy_version 1401 (0.1187) [2024-09-01 16:39:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5738496. Throughput: 0: 231.6. Samples: 431750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:39:13,186][00194] Avg episode reward: [(0, '27.130')] [2024-09-01 16:39:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 5738496. Throughput: 0: 226.0. Samples: 432468. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:39:18,181][00194] Avg episode reward: [(0, '27.574')] [2024-09-01 16:39:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5746688. Throughput: 0: 223.6. Samples: 433670. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:23,186][00194] Avg episode reward: [(0, '27.670')] [2024-09-01 16:39:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5750784. Throughput: 0: 244.6. Samples: 435342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:28,187][00194] Avg episode reward: [(0, '28.205')] [2024-09-01 16:39:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5754880. Throughput: 0: 232.3. Samples: 435880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:33,186][00194] Avg episode reward: [(0, '28.126')] [2024-09-01 16:39:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5758976. Throughput: 0: 212.3. Samples: 436844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:38,180][00194] Avg episode reward: [(0, '28.321')] [2024-09-01 16:39:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5763072. Throughput: 0: 222.5. Samples: 438484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:43,186][00194] Avg episode reward: [(0, '28.215')] [2024-09-01 16:39:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5767168. Throughput: 0: 229.5. Samples: 439124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:48,180][00194] Avg episode reward: [(0, '28.771')] [2024-09-01 16:39:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5771264. Throughput: 0: 221.1. Samples: 440588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:53,184][00194] Avg episode reward: [(0, '29.294')] [2024-09-01 16:39:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5775360. Throughput: 0: 226.7. Samples: 441952. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:58,180][00194] Avg episode reward: [(0, '29.630')] [2024-09-01 16:39:59,619][26015] Updated weights for policy 0, policy_version 1411 (0.2575) [2024-09-01 16:40:03,032][26002] Saving new best policy, reward=29.630! [2024-09-01 16:40:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5783552. Throughput: 0: 224.9. Samples: 442588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:03,180][00194] Avg episode reward: [(0, '29.775')] [2024-09-01 16:40:07,874][26002] Saving new best policy, reward=29.775! [2024-09-01 16:40:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5787648. Throughput: 0: 230.4. Samples: 444036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:08,184][00194] Avg episode reward: [(0, '29.622')] [2024-09-01 16:40:13,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 5787648. Throughput: 0: 215.9. Samples: 445056. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:13,182][00194] Avg episode reward: [(0, '29.296')] [2024-09-01 16:40:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5795840. Throughput: 0: 225.8. Samples: 446042. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:18,180][00194] Avg episode reward: [(0, '28.875')] [2024-09-01 16:40:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5799936. Throughput: 0: 235.5. Samples: 447442. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:23,179][00194] Avg episode reward: [(0, '29.872')] [2024-09-01 16:40:25,841][26002] Saving new best policy, reward=29.872! [2024-09-01 16:40:28,181][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 5804032. Throughput: 0: 226.6. Samples: 448684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:28,184][00194] Avg episode reward: [(0, '29.518')] [2024-09-01 16:40:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5808128. Throughput: 0: 223.5. Samples: 449180. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:33,179][00194] Avg episode reward: [(0, '29.178')] [2024-09-01 16:40:38,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5812224. Throughput: 0: 228.6. Samples: 450874. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:38,181][00194] Avg episode reward: [(0, '29.242')] [2024-09-01 16:40:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5816320. Throughput: 0: 228.0. Samples: 452210. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:40:43,180][00194] Avg episode reward: [(0, '29.231')] [2024-09-01 16:40:44,210][26015] Updated weights for policy 0, policy_version 1421 (0.0544) [2024-09-01 16:40:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5820416. Throughput: 0: 223.9. Samples: 452662. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:40:48,186][00194] Avg episode reward: [(0, '29.310')] [2024-09-01 16:40:49,709][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001422_5824512.pth... [2024-09-01 16:40:49,817][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001370_5611520.pth [2024-09-01 16:40:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5824512. Throughput: 0: 227.0. Samples: 454250. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:40:53,180][00194] Avg episode reward: [(0, '29.215')] [2024-09-01 16:40:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5832704. Throughput: 0: 229.6. Samples: 455390. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:58,187][00194] Avg episode reward: [(0, '29.728')] [2024-09-01 16:41:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5836800. Throughput: 0: 226.2. Samples: 456220. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:03,183][00194] Avg episode reward: [(0, '29.735')] [2024-09-01 16:41:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5840896. Throughput: 0: 218.3. Samples: 457266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:08,188][00194] Avg episode reward: [(0, '29.735')] [2024-09-01 16:41:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5844992. Throughput: 0: 227.1. Samples: 458904. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:13,180][00194] Avg episode reward: [(0, '29.259')] [2024-09-01 16:41:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5849088. Throughput: 0: 231.3. Samples: 459588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:18,180][00194] Avg episode reward: [(0, '30.461')] [2024-09-01 16:41:20,439][26002] Saving new best policy, reward=30.461! [2024-09-01 16:41:23,183][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5853184. Throughput: 0: 220.0. Samples: 460776. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:23,186][00194] Avg episode reward: [(0, '30.677')] [2024-09-01 16:41:25,831][26002] Saving new best policy, reward=30.677! [2024-09-01 16:41:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5857280. Throughput: 0: 225.9. Samples: 462376. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:28,180][00194] Avg episode reward: [(0, '31.919')] [2024-09-01 16:41:30,117][26015] Updated weights for policy 0, policy_version 1431 (0.1548) [2024-09-01 16:41:32,499][26002] Signal inference workers to stop experience collection... (450 times) [2024-09-01 16:41:32,562][26015] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-09-01 16:41:33,177][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5861376. Throughput: 0: 226.8. Samples: 462868. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:33,185][00194] Avg episode reward: [(0, '32.206')] [2024-09-01 16:41:33,431][26002] Saving new best policy, reward=31.919! [2024-09-01 16:41:33,433][26002] Signal inference workers to resume experience collection... (450 times) [2024-09-01 16:41:33,443][26015] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-09-01 16:41:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5865472. Throughput: 0: 227.5. Samples: 464488. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:38,180][00194] Avg episode reward: [(0, '31.630')] [2024-09-01 16:41:38,667][26002] Saving new best policy, reward=32.206! [2024-09-01 16:41:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5869568. Throughput: 0: 225.4. Samples: 465532. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:43,190][00194] Avg episode reward: [(0, '31.349')] [2024-09-01 16:41:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5877760. Throughput: 0: 227.9. Samples: 466474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:48,180][00194] Avg episode reward: [(0, '30.458')] [2024-09-01 16:41:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5881856. Throughput: 0: 232.0. Samples: 467704. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:41:53,180][00194] Avg episode reward: [(0, '30.334')] [2024-09-01 16:41:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5885952. Throughput: 0: 221.0. Samples: 468848. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:41:58,185][00194] Avg episode reward: [(0, '30.870')] [2024-09-01 16:42:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5890048. Throughput: 0: 224.7. Samples: 469698. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:03,180][00194] Avg episode reward: [(0, '30.500')] [2024-09-01 16:42:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5894144. Throughput: 0: 233.7. Samples: 471290. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:08,187][00194] Avg episode reward: [(0, '30.023')] [2024-09-01 16:42:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5898240. Throughput: 0: 224.9. Samples: 472496. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:13,182][00194] Avg episode reward: [(0, '29.890')] [2024-09-01 16:42:15,935][26015] Updated weights for policy 0, policy_version 1441 (0.2283) [2024-09-01 16:42:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5902336. Throughput: 0: 221.2. Samples: 472820. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:18,180][00194] Avg episode reward: [(0, '29.184')] [2024-09-01 16:42:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 5906432. Throughput: 0: 226.5. Samples: 474682. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:23,183][00194] Avg episode reward: [(0, '29.022')] [2024-09-01 16:42:28,182][00194] Fps is (10 sec: 1228.2, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 5914624. Throughput: 0: 234.3. Samples: 476078. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:28,185][00194] Avg episode reward: [(0, '30.580')] [2024-09-01 16:42:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5918720. Throughput: 0: 229.8. Samples: 476814. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:33,179][00194] Avg episode reward: [(0, '30.606')] [2024-09-01 16:42:38,177][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5922816. Throughput: 0: 226.2. Samples: 477882. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:38,184][00194] Avg episode reward: [(0, '30.535')] [2024-09-01 16:42:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5926912. Throughput: 0: 236.6. Samples: 479494. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:43,180][00194] Avg episode reward: [(0, '30.297')] [2024-09-01 16:42:48,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5931008. Throughput: 0: 233.8. Samples: 480220. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:48,186][00194] Avg episode reward: [(0, '29.842')] [2024-09-01 16:42:50,977][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001449_5935104.pth... [2024-09-01 16:42:51,095][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001396_5718016.pth [2024-09-01 16:42:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5935104. Throughput: 0: 221.6. Samples: 481264. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:53,180][00194] Avg episode reward: [(0, '29.402')] [2024-09-01 16:42:58,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5939200. Throughput: 0: 229.0. Samples: 482802. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:58,180][00194] Avg episode reward: [(0, '29.277')] [2024-09-01 16:42:59,731][26015] Updated weights for policy 0, policy_version 1451 (0.1034) [2024-09-01 16:43:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5943296. Throughput: 0: 241.0. Samples: 483664. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:43:03,182][00194] Avg episode reward: [(0, '30.124')] [2024-09-01 16:43:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5947392. Throughput: 0: 225.6. Samples: 484836. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) [2024-09-01 16:43:08,184][00194] Avg episode reward: [(0, '30.448')] [2024-09-01 16:43:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5951488. Throughput: 0: 222.8. Samples: 486104. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) [2024-09-01 16:43:13,184][00194] Avg episode reward: [(0, '30.254')] [2024-09-01 16:43:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5959680. Throughput: 0: 226.3. Samples: 486998. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 16:43:18,180][00194] Avg episode reward: [(0, '30.282')] [2024-09-01 16:43:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5963776. Throughput: 0: 234.2. Samples: 488422. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:43:23,181][00194] Avg episode reward: [(0, '30.728')] [2024-09-01 16:43:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5967872. Throughput: 0: 221.6. Samples: 489468. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:43:28,186][00194] Avg episode reward: [(0, '29.194')] [2024-09-01 16:43:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5971968. Throughput: 0: 221.4. Samples: 490182. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:33,185][00194] Avg episode reward: [(0, '29.182')] [2024-09-01 16:43:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5976064. Throughput: 0: 230.0. Samples: 491614. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:38,180][00194] Avg episode reward: [(0, '29.130')] [2024-09-01 16:43:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5980160. Throughput: 0: 229.9. Samples: 493148. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:43,182][00194] Avg episode reward: [(0, '28.244')] [2024-09-01 16:43:44,795][26015] Updated weights for policy 0, policy_version 1461 (0.1507) [2024-09-01 16:43:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5984256. Throughput: 0: 217.7. Samples: 493462. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:48,179][00194] Avg episode reward: [(0, '28.026')] [2024-09-01 16:43:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5988352. Throughput: 0: 230.8. Samples: 495220. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:53,180][00194] Avg episode reward: [(0, '27.804')] [2024-09-01 16:43:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5996544. Throughput: 0: 233.6. Samples: 496618. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:43:58,186][00194] Avg episode reward: [(0, '29.239')] [2024-09-01 16:44:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 6000640. Throughput: 0: 229.3. Samples: 497318. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:44:03,184][00194] Avg episode reward: [(0, '28.802')] [2024-09-01 16:44:07,573][26002] Stopping Batcher_0... [2024-09-01 16:44:07,574][26002] Loop batcher_evt_loop terminating... [2024-09-01 16:44:07,585][00194] Component Batcher_0 stopped! [2024-09-01 16:44:07,630][26015] Weights refcount: 2 0 [2024-09-01 16:44:07,635][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:44:07,638][26015] Stopping InferenceWorker_p0-w0... [2024-09-01 16:44:07,642][26015] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:44:08,076][26021] Stopping RolloutWorker_w5... [2024-09-01 16:44:08,086][26021] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:44:08,077][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:44:08,115][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:44:08,115][26019] Stopping RolloutWorker_w3... [2024-09-01 16:44:08,132][26019] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:44:08,133][26023] Stopping RolloutWorker_w7... [2024-09-01 16:44:08,141][26023] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:44:08,134][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:44:08,164][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:44:08,181][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:44:08,190][26017] Stopping RolloutWorker_w2... [2024-09-01 16:44:08,173][26016] Stopping RolloutWorker_w0... [2024-09-01 16:44:08,205][26016] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:44:08,208][26017] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:44:08,247][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:44:08,248][26020] Stopping RolloutWorker_w4... [2024-09-01 16:44:08,257][26020] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:44:08,276][26018] Stopping RolloutWorker_w1... [2024-09-01 16:44:08,276][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:44:08,276][26018] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:44:08,333][26022] Stopping RolloutWorker_w6... [2024-09-01 16:44:08,332][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:44:08,335][26022] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:44:12,546][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:44:12,616][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001422_5824512.pth [2024-09-01 16:44:12,624][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:44:12,714][26002] Stopping LearnerWorker_p0... [2024-09-01 16:44:12,715][26002] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:44:12,715][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:44:12,718][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:44:13,178][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:44:13,183][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:44:13,189][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:44:13,193][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:44:13,199][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:44:13,208][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:44:13,214][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:44:13,220][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:44:13,225][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:44:13,234][00194] Batcher 0 profile tree view: batching: 9.2162, releasing_batches: 0.1772 [2024-09-01 16:44:13,240][00194] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 30.9066 update_model: 81.0060 weight_update: 0.1005 one_step: 0.0376 handle_policy_step: 1436.1295 deserialize: 44.8616, stack: 7.2046, obs_to_device_normalize: 241.7567, forward: 1055.2233, send_messages: 32.3865 prepare_outputs: 17.1816 to_cpu: 1.7264 [2024-09-01 16:44:13,242][00194] Learner 0 profile tree view: misc: 0.0034, prepare_batch: 631.4118 train: 1567.9798 epoch_init: 0.0036, minibatch_init: 0.0053, losses_postprocess: 0.0786, kl_divergence: 0.2734, after_optimizer: 1.2242 calculate_losses: 757.4961 losses_init: 0.0022, forward_head: 673.4290, bptt_initial: 2.1597, tail: 1.6841, advantages_returns: 0.1136, losses: 0.8179 bptt: 79.0010 bptt_forward_core: 78.5114 update: 808.5816 clip: 1.8736 [2024-09-01 16:44:13,244][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2782, enqueue_policy_requests: 28.3148, env_step: 831.1422, overhead: 20.8807, complete_rollouts: 8.1561 save_policy_outputs: 22.1874 split_output_tensors: 7.4809 [2024-09-01 16:44:13,247][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3237, enqueue_policy_requests: 27.7500, env_step: 817.1254, overhead: 19.3453, complete_rollouts: 9.0290 save_policy_outputs: 21.3592 split_output_tensors: 7.1296 [2024-09-01 16:44:13,251][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:44:13,253][00194] Runner profile tree view: main_loop: 2242.7525 [2024-09-01 16:44:13,254][00194] Collected {0: 6008832}, FPS: 887.6 [2024-09-01 16:49:06,149][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:49:06,153][00194] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-01 16:49:06,156][00194] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-01 16:49:06,159][00194] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-01 16:49:06,162][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 16:49:06,165][00194] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-01 16:49:06,167][00194] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-01 16:49:06,170][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-01 16:49:06,171][00194] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-01 16:49:06,173][00194] Adding new argument 'hf_repository'='jarski/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-01 16:49:06,174][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-01 16:49:06,175][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-01 16:49:06,176][00194] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-01 16:49:06,177][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-01 16:49:06,180][00194] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-01 16:49:06,214][00194] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:49:06,218][00194] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:49:06,223][00194] RunningMeanStd input shape: (1,) [2024-09-01 16:49:06,266][00194] ConvEncoder: input_channels=3 [2024-09-01 16:49:06,433][00194] Conv encoder output size: 512 [2024-09-01 16:49:06,435][00194] Policy head output size: 512 [2024-09-01 16:49:06,461][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:49:07,124][00194] Num frames 100... [2024-09-01 16:49:07,354][00194] Num frames 200... [2024-09-01 16:49:07,571][00194] Num frames 300... [2024-09-01 16:49:07,831][00194] Num frames 400... [2024-09-01 16:49:08,050][00194] Num frames 500... [2024-09-01 16:49:08,267][00194] Num frames 600... [2024-09-01 16:49:08,478][00194] Num frames 700... [2024-09-01 16:49:08,688][00194] Num frames 800... [2024-09-01 16:49:08,901][00194] Num frames 900... [2024-09-01 16:49:09,119][00194] Num frames 1000... [2024-09-01 16:49:09,328][00194] Avg episode rewards: #0: 23.710, true rewards: #0: 10.710 [2024-09-01 16:49:09,330][00194] Avg episode reward: 23.710, avg true_objective: 10.710 [2024-09-01 16:49:09,398][00194] Num frames 1100... [2024-09-01 16:49:09,614][00194] Num frames 1200... [2024-09-01 16:49:09,843][00194] Num frames 1300... [2024-09-01 16:49:10,075][00194] Num frames 1400... [2024-09-01 16:49:10,305][00194] Num frames 1500... [2024-09-01 16:49:10,511][00194] Avg episode rewards: #0: 16.355, true rewards: #0: 7.855 [2024-09-01 16:49:10,513][00194] Avg episode reward: 16.355, avg true_objective: 7.855 [2024-09-01 16:49:10,583][00194] Num frames 1600... [2024-09-01 16:49:10,803][00194] Num frames 1700... [2024-09-01 16:49:11,016][00194] Num frames 1800... [2024-09-01 16:49:11,226][00194] Num frames 1900... [2024-09-01 16:49:11,441][00194] Num frames 2000... [2024-09-01 16:49:11,660][00194] Num frames 2100... [2024-09-01 16:49:11,879][00194] Num frames 2200... [2024-09-01 16:49:12,121][00194] Num frames 2300... [2024-09-01 16:49:12,410][00194] Num frames 2400... [2024-09-01 16:49:12,701][00194] Num frames 2500... [2024-09-01 16:49:12,990][00194] Num frames 2600... [2024-09-01 16:49:13,269][00194] Num frames 2700... [2024-09-01 16:49:13,547][00194] Num frames 2800... [2024-09-01 16:49:13,834][00194] Num frames 2900... [2024-09-01 16:49:14,134][00194] Num frames 3000... [2024-09-01 16:49:14,217][00194] Avg episode rewards: #0: 23.360, true rewards: #0: 10.027 [2024-09-01 16:49:14,220][00194] Avg episode reward: 23.360, avg true_objective: 10.027 [2024-09-01 16:49:14,490][00194] Num frames 3100... [2024-09-01 16:49:14,772][00194] Num frames 3200... [2024-09-01 16:49:15,076][00194] Num frames 3300... [2024-09-01 16:49:15,334][00194] Num frames 3400... [2024-09-01 16:49:15,572][00194] Avg episode rewards: #0: 19.970, true rewards: #0: 8.720 [2024-09-01 16:49:15,574][00194] Avg episode reward: 19.970, avg true_objective: 8.720 [2024-09-01 16:49:15,604][00194] Num frames 3500... [2024-09-01 16:49:15,809][00194] Num frames 3600... [2024-09-01 16:49:16,017][00194] Num frames 3700... [2024-09-01 16:49:16,239][00194] Num frames 3800... [2024-09-01 16:49:16,451][00194] Num frames 3900... [2024-09-01 16:49:16,663][00194] Num frames 4000... [2024-09-01 16:49:16,786][00194] Avg episode rewards: #0: 17.864, true rewards: #0: 8.064 [2024-09-01 16:49:16,788][00194] Avg episode reward: 17.864, avg true_objective: 8.064 [2024-09-01 16:49:16,929][00194] Num frames 4100... [2024-09-01 16:49:17,154][00194] Num frames 4200... [2024-09-01 16:49:17,358][00194] Num frames 4300... [2024-09-01 16:49:17,559][00194] Num frames 4400... [2024-09-01 16:49:17,767][00194] Num frames 4500... [2024-09-01 16:49:17,925][00194] Avg episode rewards: #0: 16.073, true rewards: #0: 7.573 [2024-09-01 16:49:17,927][00194] Avg episode reward: 16.073, avg true_objective: 7.573 [2024-09-01 16:49:18,047][00194] Num frames 4600... [2024-09-01 16:49:18,282][00194] Num frames 4700... [2024-09-01 16:49:18,500][00194] Num frames 4800... [2024-09-01 16:49:18,719][00194] Num frames 4900... [2024-09-01 16:49:18,929][00194] Num frames 5000... [2024-09-01 16:49:19,154][00194] Num frames 5100... [2024-09-01 16:49:19,383][00194] Num frames 5200... [2024-09-01 16:49:19,621][00194] Num frames 5300... [2024-09-01 16:49:19,842][00194] Num frames 5400... [2024-09-01 16:49:20,061][00194] Num frames 5500... [2024-09-01 16:49:20,293][00194] Num frames 5600... [2024-09-01 16:49:20,497][00194] Num frames 5700... [2024-09-01 16:49:20,708][00194] Num frames 5800... [2024-09-01 16:49:20,915][00194] Num frames 5900... [2024-09-01 16:49:21,087][00194] Avg episode rewards: #0: 18.931, true rewards: #0: 8.503 [2024-09-01 16:49:21,091][00194] Avg episode reward: 18.931, avg true_objective: 8.503 [2024-09-01 16:49:21,191][00194] Num frames 6000... [2024-09-01 16:49:21,410][00194] Num frames 6100... [2024-09-01 16:49:21,625][00194] Num frames 6200... [2024-09-01 16:49:21,831][00194] Num frames 6300... [2024-09-01 16:49:22,044][00194] Num frames 6400... [2024-09-01 16:49:22,269][00194] Num frames 6500... [2024-09-01 16:49:22,490][00194] Num frames 6600... [2024-09-01 16:49:22,705][00194] Num frames 6700... [2024-09-01 16:49:22,943][00194] Num frames 6800... [2024-09-01 16:49:23,176][00194] Num frames 6900... [2024-09-01 16:49:23,417][00194] Num frames 7000... [2024-09-01 16:49:23,654][00194] Num frames 7100... [2024-09-01 16:49:23,871][00194] Num frames 7200... [2024-09-01 16:49:24,100][00194] Num frames 7300... [2024-09-01 16:49:24,317][00194] Num frames 7400... [2024-09-01 16:49:24,542][00194] Num frames 7500... [2024-09-01 16:49:24,760][00194] Avg episode rewards: #0: 21.964, true rewards: #0: 9.464 [2024-09-01 16:49:24,762][00194] Avg episode reward: 21.964, avg true_objective: 9.464 [2024-09-01 16:49:24,832][00194] Num frames 7600... [2024-09-01 16:49:25,063][00194] Num frames 7700... [2024-09-01 16:49:25,328][00194] Num frames 7800... [2024-09-01 16:49:25,643][00194] Num frames 7900... [2024-09-01 16:49:25,922][00194] Num frames 8000... [2024-09-01 16:49:26,197][00194] Num frames 8100... [2024-09-01 16:49:26,486][00194] Num frames 8200... [2024-09-01 16:49:26,807][00194] Num frames 8300... [2024-09-01 16:49:27,106][00194] Num frames 8400... [2024-09-01 16:49:27,399][00194] Num frames 8500... [2024-09-01 16:49:27,493][00194] Avg episode rewards: #0: 21.791, true rewards: #0: 9.458 [2024-09-01 16:49:27,497][00194] Avg episode reward: 21.791, avg true_objective: 9.458 [2024-09-01 16:49:27,755][00194] Num frames 8600... [2024-09-01 16:49:28,054][00194] Num frames 8700... [2024-09-01 16:49:28,371][00194] Num frames 8800... [2024-09-01 16:49:28,609][00194] Num frames 8900... [2024-09-01 16:49:28,835][00194] Num frames 9000... [2024-09-01 16:49:28,942][00194] Avg episode rewards: #0: 20.724, true rewards: #0: 9.024 [2024-09-01 16:49:28,943][00194] Avg episode reward: 20.724, avg true_objective: 9.024 [2024-09-01 16:50:30,694][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-01 16:50:35,080][00194] The model has been pushed to https://huggingface.co/jarski/rl_course_vizdoom_health_gathering_supreme [2024-09-01 16:51:00,117][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:51:00,120][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:51:00,121][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:51:00,126][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:51:00,128][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:51:00,130][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:51:00,132][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:51:00,135][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:51:00,138][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:51:00,140][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:51:00,142][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:51:00,145][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:51:00,148][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:51:00,150][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:51:00,152][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:51:00,154][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:51:00,155][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:51:00,157][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:51:00,159][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:51:00,161][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:51:00,163][00194] register_encoder_factory: [2024-09-01 16:51:31,464][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:51:31,466][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:51:31,468][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:51:31,474][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:51:31,476][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:51:31,478][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:51:31,479][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:51:31,483][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:51:31,486][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:51:31,487][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:51:31,489][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:51:31,490][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:51:31,491][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:51:31,492][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:51:31,493][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:51:31,495][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:51:31,496][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:51:31,497][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:51:31,498][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:51:31,500][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:51:31,501][00194] register_encoder_factory: [2024-09-01 16:51:31,529][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:51:31,537][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 16:51:31,539][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 16:51:31,540][00194] Weights and Biases integration disabled [2024-09-01 16:51:31,547][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 16:51:34,202][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=6000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 16:51:34,205][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 16:51:34,209][00194] Rollout worker 0 uses device cpu [2024-09-01 16:51:34,215][00194] Rollout worker 1 uses device cpu [2024-09-01 16:51:34,220][00194] Rollout worker 2 uses device cpu [2024-09-01 16:51:34,222][00194] Rollout worker 3 uses device cpu [2024-09-01 16:51:34,227][00194] Rollout worker 4 uses device cpu [2024-09-01 16:51:34,229][00194] Rollout worker 5 uses device cpu [2024-09-01 16:51:34,230][00194] Rollout worker 6 uses device cpu [2024-09-01 16:51:34,236][00194] Rollout worker 7 uses device cpu [2024-09-01 16:51:34,424][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 16:51:34,477][00194] Starting all processes... [2024-09-01 16:51:34,480][00194] Starting process learner_proc0 [2024-09-01 16:51:34,569][00194] Starting all processes... [2024-09-01 16:51:34,585][00194] Starting process inference_proc0-0 [2024-09-01 16:51:34,597][00194] Starting process rollout_proc0 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc1 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc2 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc3 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc4 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc5 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc6 [2024-09-01 16:51:34,598][00194] Starting process rollout_proc7 [2024-09-01 16:51:49,837][36931] Starting seed is not provided [2024-09-01 16:51:49,840][36931] Initializing actor-critic model on device cpu [2024-09-01 16:51:49,841][36931] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:51:49,856][36931] RunningMeanStd input shape: (1,) [2024-09-01 16:51:49,914][36947] Worker 0 uses CPU cores [0] [2024-09-01 16:51:49,954][36931] ConvEncoder: input_channels=3 [2024-09-01 16:51:50,249][36948] Worker 3 uses CPU cores [1] [2024-09-01 16:51:50,393][36952] Worker 7 uses CPU cores [1] [2024-09-01 16:51:50,420][36950] Worker 5 uses CPU cores [1] [2024-09-01 16:51:50,471][36951] Worker 6 uses CPU cores [0] [2024-09-01 16:51:50,513][36945] Worker 1 uses CPU cores [1] [2024-09-01 16:51:50,674][36946] Worker 2 uses CPU cores [0] [2024-09-01 16:51:50,718][36949] Worker 4 uses CPU cores [0] [2024-09-01 16:51:50,755][36931] Conv encoder output size: 512 [2024-09-01 16:51:50,756][36931] Policy head output size: 512 [2024-09-01 16:51:50,785][36931] Created Actor Critic model with architecture: [2024-09-01 16:51:50,786][36931] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 16:51:51,787][36931] Using optimizer [2024-09-01 16:51:51,789][36931] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:51:51,879][36931] Loading model from checkpoint [2024-09-01 16:51:51,950][36931] Loaded experiment state at self.train_step=1467, self.env_steps=6008832 [2024-09-01 16:51:51,951][36931] Initialized policy 0 weights for model version 1467 [2024-09-01 16:51:51,962][36931] LearnerWorker_p0 finished initialization! [2024-09-01 16:51:51,973][36944] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:51:51,978][36944] RunningMeanStd input shape: (1,) [2024-09-01 16:51:52,040][36944] ConvEncoder: input_channels=3 [2024-09-01 16:51:52,367][36944] Conv encoder output size: 512 [2024-09-01 16:51:52,368][36944] Policy head output size: 512 [2024-09-01 16:51:52,412][00194] Inference worker 0-0 is ready! [2024-09-01 16:51:52,414][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 16:51:52,662][36952] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,660][36945] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,662][36951] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,669][36949] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,671][36948] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,665][36946] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,669][36947] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:52,673][36950] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:51:53,740][36945] Decorrelating experience for 0 frames... [2024-09-01 16:51:53,745][36952] Decorrelating experience for 0 frames... [2024-09-01 16:51:54,135][36951] Decorrelating experience for 0 frames... [2024-09-01 16:51:54,134][36947] Decorrelating experience for 0 frames... [2024-09-01 16:51:54,152][36949] Decorrelating experience for 0 frames... [2024-09-01 16:51:54,410][00194] Heartbeat connected on Batcher_0 [2024-09-01 16:51:54,416][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 16:51:54,474][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 16:51:54,950][36952] Decorrelating experience for 32 frames... [2024-09-01 16:51:55,094][36950] Decorrelating experience for 0 frames... [2024-09-01 16:51:55,447][36948] Decorrelating experience for 0 frames... [2024-09-01 16:51:55,904][36951] Decorrelating experience for 32 frames... [2024-09-01 16:51:55,911][36947] Decorrelating experience for 32 frames... [2024-09-01 16:51:55,918][36946] Decorrelating experience for 0 frames... [2024-09-01 16:51:55,970][36949] Decorrelating experience for 32 frames... [2024-09-01 16:51:56,181][36952] Decorrelating experience for 64 frames... [2024-09-01 16:51:56,547][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 6008832. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:51:56,554][36948] Decorrelating experience for 32 frames... [2024-09-01 16:51:57,059][36950] Decorrelating experience for 32 frames... [2024-09-01 16:51:57,364][36946] Decorrelating experience for 32 frames... [2024-09-01 16:51:57,598][36947] Decorrelating experience for 64 frames... [2024-09-01 16:51:57,629][36949] Decorrelating experience for 64 frames... [2024-09-01 16:51:57,941][36948] Decorrelating experience for 64 frames... [2024-09-01 16:51:58,443][36952] Decorrelating experience for 96 frames... [2024-09-01 16:51:58,691][36950] Decorrelating experience for 64 frames... [2024-09-01 16:51:58,812][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 16:51:58,961][36951] Decorrelating experience for 64 frames... [2024-09-01 16:51:59,144][36946] Decorrelating experience for 64 frames... [2024-09-01 16:51:59,362][36947] Decorrelating experience for 96 frames... [2024-09-01 16:51:59,653][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 16:52:00,159][36948] Decorrelating experience for 96 frames... [2024-09-01 16:52:00,704][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 16:52:01,002][36951] Decorrelating experience for 96 frames... [2024-09-01 16:52:01,063][36949] Decorrelating experience for 96 frames... [2024-09-01 16:52:01,086][36945] Decorrelating experience for 32 frames... [2024-09-01 16:52:01,547][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6008832. Throughput: 0: 111.6. Samples: 558. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:52:01,552][00194] Avg episode reward: [(0, '0.960')] [2024-09-01 16:52:01,616][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 16:52:01,645][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 16:52:01,943][36946] Decorrelating experience for 96 frames... [2024-09-01 16:52:02,486][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 16:52:03,080][36950] Decorrelating experience for 96 frames... [2024-09-01 16:52:04,137][36945] Decorrelating experience for 64 frames... [2024-09-01 16:52:04,212][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 16:52:06,547][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6008832. Throughput: 0: 143.2. Samples: 1432. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:52:06,554][00194] Avg episode reward: [(0, '2.253')] [2024-09-01 16:52:08,803][36931] Signal inference workers to stop experience collection... [2024-09-01 16:52:08,902][36944] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 16:52:10,471][36945] Decorrelating experience for 96 frames... [2024-09-01 16:52:11,106][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 16:52:11,298][36931] Signal inference workers to resume experience collection... [2024-09-01 16:52:11,302][36944] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 16:52:11,310][00194] Component Batcher_0 stopped! [2024-09-01 16:52:11,310][36931] Stopping Batcher_0... [2024-09-01 16:52:11,317][36931] Loop batcher_evt_loop terminating... [2024-09-01 16:52:11,376][36944] Weights refcount: 2 0 [2024-09-01 16:52:11,385][36944] Stopping InferenceWorker_p0-w0... [2024-09-01 16:52:11,386][36944] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:52:11,386][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:52:12,122][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:52:12,130][36952] Stopping RolloutWorker_w7... [2024-09-01 16:52:12,131][36952] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:52:12,167][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:52:12,173][36948] Stopping RolloutWorker_w3... [2024-09-01 16:52:12,173][36948] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:52:12,196][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:52:12,201][36950] Stopping RolloutWorker_w5... [2024-09-01 16:52:12,209][36950] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:52:12,217][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:52:12,222][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:52:12,222][36945] Stopping RolloutWorker_w1... [2024-09-01 16:52:12,222][36951] Stopping RolloutWorker_w6... [2024-09-01 16:52:12,228][36945] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:52:12,260][36951] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:52:12,284][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:52:12,286][36947] Stopping RolloutWorker_w0... [2024-09-01 16:52:12,309][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:52:12,324][36947] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:52:12,309][36946] Stopping RolloutWorker_w2... [2024-09-01 16:52:12,331][36946] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:52:12,458][36949] Stopping RolloutWorker_w4... [2024-09-01 16:52:12,453][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:52:12,459][36949] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:52:16,876][36931] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001469_6017024.pth... [2024-09-01 16:52:16,925][36931] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001449_5935104.pth [2024-09-01 16:52:16,933][36931] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001469_6017024.pth... [2024-09-01 16:52:17,028][36931] Stopping LearnerWorker_p0... [2024-09-01 16:52:17,030][36931] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:52:17,029][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:52:17,031][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:52:17,438][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:52:17,444][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:52:17,449][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:52:17,455][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:52:17,460][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:52:17,464][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:52:17,469][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:52:17,474][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:52:17,479][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:52:17,484][00194] Batcher 0 profile tree view: batching: 0.0483, releasing_batches: 0.0006 [2024-09-01 16:52:17,486][00194] InferenceWorker_p0-w0 profile tree view: update_model: 0.0562 wait_policy: 0.0001 wait_policy_total: 7.2953 one_step: 0.0896 handle_policy_step: 8.6890 deserialize: 0.1233, stack: 0.0148, obs_to_device_normalize: 1.0116, forward: 6.9840, send_messages: 0.1645 prepare_outputs: 0.1946 to_cpu: 0.0106 [2024-09-01 16:52:17,489][00194] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 5.1628 train: 7.7603 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0010, after_optimizer: 0.0100 calculate_losses: 3.0498 losses_init: 0.0000, forward_head: 2.5822, bptt_initial: 0.0046, tail: 0.0180, advantages_returns: 0.0011, losses: 0.0092 bptt: 0.4339 bptt_forward_core: 0.4328 update: 4.6981 clip: 0.0075 [2024-09-01 16:52:17,491][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0015, enqueue_policy_requests: 0.3761, env_step: 4.9915, overhead: 0.1970, complete_rollouts: 0.0634 save_policy_outputs: 0.1590 split_output_tensors: 0.0354 [2024-09-01 16:52:17,492][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0024, enqueue_policy_requests: 0.8959, env_step: 5.3992, overhead: 0.1658, complete_rollouts: 0.0540 save_policy_outputs: 0.2810 split_output_tensors: 0.0944 [2024-09-01 16:52:17,494][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:52:17,496][00194] Runner profile tree view: main_loop: 43.0191 [2024-09-01 16:52:17,500][00194] Collected {0: 6017024}, FPS: 190.4 [2024-09-01 16:53:04,116][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:53:04,120][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:53:04,122][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:53:04,124][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:53:04,126][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:53:04,127][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:53:04,128][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:53:04,130][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:53:04,131][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:53:04,132][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:53:04,133][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:53:04,134][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:53:04,136][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:53:04,137][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:53:04,138][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:53:04,139][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:53:04,140][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:53:04,142][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:53:04,143][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:53:04,144][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:53:04,159][00194] register_encoder_factory: [2024-09-01 16:53:04,178][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:53:04,181][00194] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line [2024-09-01 16:53:04,189][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 16:53:04,191][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 16:53:04,193][00194] Weights and Biases integration disabled [2024-09-01 16:53:04,198][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 16:53:06,150][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 16:53:06,154][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 16:53:06,157][00194] Rollout worker 0 uses device cpu [2024-09-01 16:53:06,159][00194] Rollout worker 1 uses device cpu [2024-09-01 16:53:06,161][00194] Rollout worker 2 uses device cpu [2024-09-01 16:53:06,162][00194] Rollout worker 3 uses device cpu [2024-09-01 16:53:06,163][00194] Rollout worker 4 uses device cpu [2024-09-01 16:53:06,165][00194] Rollout worker 5 uses device cpu [2024-09-01 16:53:06,166][00194] Rollout worker 6 uses device cpu [2024-09-01 16:53:06,167][00194] Rollout worker 7 uses device cpu [2024-09-01 16:53:06,326][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 16:53:06,367][00194] Starting all processes... [2024-09-01 16:53:06,368][00194] Starting process learner_proc0 [2024-09-01 16:53:06,858][00194] Starting all processes... [2024-09-01 16:53:06,867][00194] Starting process inference_proc0-0 [2024-09-01 16:53:06,868][00194] Starting process rollout_proc0 [2024-09-01 16:53:06,870][00194] Starting process rollout_proc1 [2024-09-01 16:53:06,871][00194] Starting process rollout_proc2 [2024-09-01 16:53:06,871][00194] Starting process rollout_proc3 [2024-09-01 16:53:06,871][00194] Starting process rollout_proc4 [2024-09-01 16:53:06,871][00194] Starting process rollout_proc5 [2024-09-01 16:53:06,871][00194] Starting process rollout_proc6 [2024-09-01 16:53:06,871][00194] Starting process rollout_proc7 [2024-09-01 16:53:21,630][37536] Starting seed is not provided [2024-09-01 16:53:21,632][37536] Initializing actor-critic model on device cpu [2024-09-01 16:53:21,634][37536] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:53:21,637][37536] RunningMeanStd input shape: (1,) [2024-09-01 16:53:21,693][37561] Worker 7 uses CPU cores [1] [2024-09-01 16:53:21,774][37536] ConvEncoder: input_channels=3 [2024-09-01 16:53:21,929][37556] Worker 2 uses CPU cores [0] [2024-09-01 16:53:22,133][37559] Worker 5 uses CPU cores [1] [2024-09-01 16:53:22,141][37555] Worker 0 uses CPU cores [0] [2024-09-01 16:53:22,144][37558] Worker 4 uses CPU cores [0] [2024-09-01 16:53:22,149][37560] Worker 6 uses CPU cores [0] [2024-09-01 16:53:22,231][37557] Worker 3 uses CPU cores [1] [2024-09-01 16:53:22,235][37551] Worker 1 uses CPU cores [1] [2024-09-01 16:53:22,301][37536] Conv encoder output size: 512 [2024-09-01 16:53:22,302][37536] Policy head output size: 512 [2024-09-01 16:53:22,319][37536] Created Actor Critic model with architecture: [2024-09-01 16:53:22,319][37536] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 16:53:22,800][37536] Using optimizer [2024-09-01 16:53:22,802][37536] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001469_6017024.pth... [2024-09-01 16:53:22,837][37536] Loading model from checkpoint [2024-09-01 16:53:22,866][37536] Loaded experiment state at self.train_step=1469, self.env_steps=6017024 [2024-09-01 16:53:22,868][37536] Initialized policy 0 weights for model version 1469 [2024-09-01 16:53:22,874][37549] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:53:22,876][37549] RunningMeanStd input shape: (1,) [2024-09-01 16:53:22,886][37536] LearnerWorker_p0 finished initialization! [2024-09-01 16:53:22,901][37549] ConvEncoder: input_channels=3 [2024-09-01 16:53:23,087][37549] Conv encoder output size: 512 [2024-09-01 16:53:23,088][37549] Policy head output size: 512 [2024-09-01 16:53:23,115][00194] Inference worker 0-0 is ready! [2024-09-01 16:53:23,117][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 16:53:23,205][37556] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,203][37560] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,208][37558] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,214][37555] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,243][37559] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,250][37557] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,252][37551] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:23,248][37561] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:53:24,198][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 6017024. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:53:24,248][37557] Decorrelating experience for 0 frames... [2024-09-01 16:53:24,256][37561] Decorrelating experience for 0 frames... [2024-09-01 16:53:25,107][37555] Decorrelating experience for 0 frames... [2024-09-01 16:53:25,112][37556] Decorrelating experience for 0 frames... [2024-09-01 16:53:25,122][37560] Decorrelating experience for 0 frames... [2024-09-01 16:53:25,128][37558] Decorrelating experience for 0 frames... [2024-09-01 16:53:25,153][37557] Decorrelating experience for 32 frames... [2024-09-01 16:53:25,156][37559] Decorrelating experience for 0 frames... [2024-09-01 16:53:26,320][00194] Heartbeat connected on Batcher_0 [2024-09-01 16:53:26,322][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 16:53:26,361][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 16:53:26,934][37556] Decorrelating experience for 32 frames... [2024-09-01 16:53:26,940][37560] Decorrelating experience for 32 frames... [2024-09-01 16:53:26,936][37555] Decorrelating experience for 32 frames... [2024-09-01 16:53:27,141][37551] Decorrelating experience for 0 frames... [2024-09-01 16:53:27,217][37559] Decorrelating experience for 32 frames... [2024-09-01 16:53:27,221][37561] Decorrelating experience for 32 frames... [2024-09-01 16:53:28,842][37558] Decorrelating experience for 32 frames... [2024-09-01 16:53:29,065][37556] Decorrelating experience for 64 frames... [2024-09-01 16:53:29,082][37560] Decorrelating experience for 64 frames... [2024-09-01 16:53:29,198][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6017024. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:53:29,611][37551] Decorrelating experience for 32 frames... [2024-09-01 16:53:29,633][37557] Decorrelating experience for 64 frames... [2024-09-01 16:53:29,954][37561] Decorrelating experience for 64 frames... [2024-09-01 16:53:31,307][37558] Decorrelating experience for 64 frames... [2024-09-01 16:53:31,540][37556] Decorrelating experience for 96 frames... [2024-09-01 16:53:32,089][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 16:53:32,242][37559] Decorrelating experience for 64 frames... [2024-09-01 16:53:32,432][37555] Decorrelating experience for 64 frames... [2024-09-01 16:53:32,648][37551] Decorrelating experience for 64 frames... [2024-09-01 16:53:32,658][37557] Decorrelating experience for 96 frames... [2024-09-01 16:53:33,028][37561] Decorrelating experience for 96 frames... [2024-09-01 16:53:33,177][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 16:53:33,709][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 16:53:34,199][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6017024. Throughput: 0: 1.2. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:53:34,219][00194] Avg episode reward: [(0, '1.710')] [2024-09-01 16:53:34,974][37560] Decorrelating experience for 96 frames... [2024-09-01 16:53:35,348][37558] Decorrelating experience for 96 frames... [2024-09-01 16:53:35,712][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 16:53:35,846][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 16:53:36,250][37555] Decorrelating experience for 96 frames... [2024-09-01 16:53:36,590][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 16:53:36,753][37551] Decorrelating experience for 96 frames... [2024-09-01 16:53:36,765][37559] Decorrelating experience for 96 frames... [2024-09-01 16:53:37,115][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 16:53:37,146][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 16:53:39,198][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6017024. Throughput: 0: 101.9. Samples: 1528. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:53:39,202][00194] Avg episode reward: [(0, '3.255')] [2024-09-01 16:53:40,443][37536] Signal inference workers to stop experience collection... [2024-09-01 16:53:40,515][37549] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 16:53:41,910][37536] Signal inference workers to resume experience collection... [2024-09-01 16:53:41,913][37549] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 16:53:44,198][00194] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 6021120. Throughput: 0: 146.1. Samples: 2922. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 16:53:44,201][00194] Avg episode reward: [(0, '3.862')] [2024-09-01 16:53:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 6025216. Throughput: 0: 143.4. Samples: 3584. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 16:53:49,207][00194] Avg episode reward: [(0, '3.862')] [2024-09-01 16:53:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 6029312. Throughput: 0: 152.4. Samples: 4572. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:53:54,200][00194] Avg episode reward: [(0, '5.013')] [2024-09-01 16:53:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 6033408. Throughput: 0: 181.3. Samples: 6344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:53:59,205][00194] Avg episode reward: [(0, '7.991')] [2024-09-01 16:54:04,204][00194] Fps is (10 sec: 818.7, 60 sec: 511.9, 300 sec: 511.9). Total num frames: 6037504. Throughput: 0: 171.7. Samples: 6870. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:54:04,208][00194] Avg episode reward: [(0, '7.991')] [2024-09-01 16:54:09,201][00194] Fps is (10 sec: 819.0, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 6041600. Throughput: 0: 179.2. Samples: 8064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:54:09,204][00194] Avg episode reward: [(0, '9.398')] [2024-09-01 16:54:14,199][00194] Fps is (10 sec: 819.7, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 6045696. Throughput: 0: 217.8. Samples: 9800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:54:14,206][00194] Avg episode reward: [(0, '10.789')] [2024-09-01 16:54:19,198][00194] Fps is (10 sec: 819.4, 60 sec: 595.8, 300 sec: 595.8). Total num frames: 6049792. Throughput: 0: 233.6. Samples: 10522. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:54:19,202][00194] Avg episode reward: [(0, '12.606')] [2024-09-01 16:54:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 6053888. Throughput: 0: 230.6. Samples: 11906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:54:24,202][00194] Avg episode reward: [(0, '13.361')] [2024-09-01 16:54:24,252][37549] Updated weights for policy 0, policy_version 1479 (0.1580) [2024-09-01 16:54:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 630.2). Total num frames: 6057984. Throughput: 0: 222.4. Samples: 12928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:54:29,200][00194] Avg episode reward: [(0, '14.919')] [2024-09-01 16:54:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 6066176. Throughput: 0: 230.8. Samples: 13970. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:54:34,200][00194] Avg episode reward: [(0, '15.589')] [2024-09-01 16:54:39,199][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 710.0). Total num frames: 6070272. Throughput: 0: 242.8. Samples: 15496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:54:39,205][00194] Avg episode reward: [(0, '16.970')] [2024-09-01 16:54:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 716.8). Total num frames: 6074368. Throughput: 0: 228.0. Samples: 16604. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:54:44,202][00194] Avg episode reward: [(0, '17.311')] [2024-09-01 16:54:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 722.8). Total num frames: 6078464. Throughput: 0: 226.2. Samples: 17046. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:54:49,208][00194] Avg episode reward: [(0, '18.330')] [2024-09-01 16:54:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 6082560. Throughput: 0: 238.9. Samples: 18816. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:54:54,205][00194] Avg episode reward: [(0, '18.873')] [2024-09-01 16:54:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 776.1). Total num frames: 6090752. Throughput: 0: 229.6. Samples: 20132. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:54:59,201][00194] Avg episode reward: [(0, '19.758')] [2024-09-01 16:55:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 737.3). Total num frames: 6090752. Throughput: 0: 226.4. Samples: 20710. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:55:04,200][00194] Avg episode reward: [(0, '20.737')] [2024-09-01 16:55:08,372][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth... [2024-09-01 16:55:08,377][37549] Updated weights for policy 0, policy_version 1489 (0.1222) [2024-09-01 16:55:08,486][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth [2024-09-01 16:55:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 780.2). Total num frames: 6098944. Throughput: 0: 228.0. Samples: 22164. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:55:09,205][00194] Avg episode reward: [(0, '20.504')] [2024-09-01 16:55:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 782.0). Total num frames: 6103040. Throughput: 0: 247.3. Samples: 24056. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:55:14,201][00194] Avg episode reward: [(0, '21.430')] [2024-09-01 16:55:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 783.6). Total num frames: 6107136. Throughput: 0: 230.0. Samples: 24318. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:55:19,204][00194] Avg episode reward: [(0, '21.430')] [2024-09-01 16:55:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 785.1). Total num frames: 6111232. Throughput: 0: 223.9. Samples: 25572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:55:24,200][00194] Avg episode reward: [(0, '21.373')] [2024-09-01 16:55:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 786.4). Total num frames: 6115328. Throughput: 0: 237.5. Samples: 27292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:55:29,208][00194] Avg episode reward: [(0, '21.905')] [2024-09-01 16:55:34,200][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 6123520. Throughput: 0: 245.9. Samples: 28112. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:55:34,202][00194] Avg episode reward: [(0, '22.764')] [2024-09-01 16:55:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 788.9). Total num frames: 6123520. Throughput: 0: 233.1. Samples: 29306. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:55:39,201][00194] Avg episode reward: [(0, '23.171')] [2024-09-01 16:55:44,198][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 6131712. Throughput: 0: 232.3. Samples: 30584. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:55:44,200][00194] Avg episode reward: [(0, '23.682')] [2024-09-01 16:55:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 6135808. Throughput: 0: 236.5. Samples: 31352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:55:49,201][00194] Avg episode reward: [(0, '24.860')] [2024-09-01 16:55:51,392][37549] Updated weights for policy 0, policy_version 1499 (0.0513) [2024-09-01 16:55:54,199][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 6139904. Throughput: 0: 237.2. Samples: 32836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:55:54,202][00194] Avg episode reward: [(0, '26.227')] [2024-09-01 16:55:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 6144000. Throughput: 0: 221.8. Samples: 34036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:55:59,206][00194] Avg episode reward: [(0, '26.715')] [2024-09-01 16:56:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 6148096. Throughput: 0: 232.5. Samples: 34782. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:04,201][00194] Avg episode reward: [(0, '27.059')] [2024-09-01 16:56:09,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 844.0). Total num frames: 6156288. Throughput: 0: 241.9. Samples: 36458. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:09,207][00194] Avg episode reward: [(0, '28.093')] [2024-09-01 16:56:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 6156288. Throughput: 0: 225.8. Samples: 37454. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:14,201][00194] Avg episode reward: [(0, '28.175')] [2024-09-01 16:56:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 842.6). Total num frames: 6164480. Throughput: 0: 222.8. Samples: 38138. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:19,207][00194] Avg episode reward: [(0, '28.414')] [2024-09-01 16:56:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 842.0). Total num frames: 6168576. Throughput: 0: 230.9. Samples: 39696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:24,205][00194] Avg episode reward: [(0, '28.774')] [2024-09-01 16:56:29,200][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 841.3). Total num frames: 6172672. Throughput: 0: 221.5. Samples: 40552. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:29,206][00194] Avg episode reward: [(0, '28.774')] [2024-09-01 16:56:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 840.8). Total num frames: 6176768. Throughput: 0: 228.5. Samples: 41634. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:56:34,202][00194] Avg episode reward: [(0, '28.816')] [2024-09-01 16:56:36,742][37549] Updated weights for policy 0, policy_version 1509 (0.1582) [2024-09-01 16:56:39,198][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 840.2). Total num frames: 6180864. Throughput: 0: 230.0. Samples: 43186. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:56:39,206][00194] Avg episode reward: [(0, '28.860')] [2024-09-01 16:56:44,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 860.2). Total num frames: 6189056. Throughput: 0: 236.8. Samples: 44692. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:56:44,205][00194] Avg episode reward: [(0, '28.796')] [2024-09-01 16:56:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 6189056. Throughput: 0: 235.4. Samples: 45374. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:56:49,201][00194] Avg episode reward: [(0, '29.156')] [2024-09-01 16:56:54,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 838.7). Total num frames: 6193152. Throughput: 0: 225.2. Samples: 46594. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:54,206][00194] Avg episode reward: [(0, '29.288')] [2024-09-01 16:56:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 857.3). Total num frames: 6201344. Throughput: 0: 236.8. Samples: 48110. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:56:59,202][00194] Avg episode reward: [(0, '29.215')] [2024-09-01 16:57:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 856.4). Total num frames: 6205440. Throughput: 0: 237.5. Samples: 48826. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:57:04,200][00194] Avg episode reward: [(0, '28.978')] [2024-09-01 16:57:06,941][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001516_6209536.pth... [2024-09-01 16:57:07,097][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001469_6017024.pth [2024-09-01 16:57:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 855.6). Total num frames: 6209536. Throughput: 0: 231.1. Samples: 50096. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:57:09,201][00194] Avg episode reward: [(0, '29.174')] [2024-09-01 16:57:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 854.8). Total num frames: 6213632. Throughput: 0: 246.7. Samples: 51652. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:57:14,204][00194] Avg episode reward: [(0, '29.462')] [2024-09-01 16:57:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 854.1). Total num frames: 6217728. Throughput: 0: 237.8. Samples: 52334. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:57:19,210][00194] Avg episode reward: [(0, '28.760')] [2024-09-01 16:57:19,733][37549] Updated weights for policy 0, policy_version 1519 (0.0519) [2024-09-01 16:57:22,940][37536] Signal inference workers to stop experience collection... (50 times) [2024-09-01 16:57:23,058][37549] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 16:57:24,199][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 853.3). Total num frames: 6221824. Throughput: 0: 238.7. Samples: 53928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:57:24,202][00194] Avg episode reward: [(0, '28.760')] [2024-09-01 16:57:24,833][37536] Signal inference workers to resume experience collection... (50 times) [2024-09-01 16:57:24,835][37549] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 16:57:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 852.6). Total num frames: 6225920. Throughput: 0: 226.9. Samples: 54902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:57:29,201][00194] Avg episode reward: [(0, '29.489')] [2024-09-01 16:57:34,198][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 868.4). Total num frames: 6234112. Throughput: 0: 234.1. Samples: 55908. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:57:34,204][00194] Avg episode reward: [(0, '29.720')] [2024-09-01 16:57:39,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 867.4). Total num frames: 6238208. Throughput: 0: 235.9. Samples: 57208. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:57:39,203][00194] Avg episode reward: [(0, '29.776')] [2024-09-01 16:57:44,202][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 866.4). Total num frames: 6242304. Throughput: 0: 226.8. Samples: 58316. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:57:44,205][00194] Avg episode reward: [(0, '29.802')] [2024-09-01 16:57:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 865.6). Total num frames: 6246400. Throughput: 0: 226.3. Samples: 59010. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:57:49,201][00194] Avg episode reward: [(0, '29.778')] [2024-09-01 16:57:54,198][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 864.7). Total num frames: 6250496. Throughput: 0: 234.5. Samples: 60650. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:57:54,205][00194] Avg episode reward: [(0, '29.512')] [2024-09-01 16:57:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 863.9). Total num frames: 6254592. Throughput: 0: 230.4. Samples: 62020. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:57:59,200][00194] Avg episode reward: [(0, '29.512')] [2024-09-01 16:58:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 863.1). Total num frames: 6258688. Throughput: 0: 227.8. Samples: 62586. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:58:04,203][00194] Avg episode reward: [(0, '29.576')] [2024-09-01 16:58:05,139][37549] Updated weights for policy 0, policy_version 1529 (0.2734) [2024-09-01 16:58:09,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 876.7). Total num frames: 6266880. Throughput: 0: 226.0. Samples: 64098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:58:09,206][00194] Avg episode reward: [(0, '28.925')] [2024-09-01 16:58:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 875.7). Total num frames: 6270976. Throughput: 0: 241.5. Samples: 65770. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:58:14,205][00194] Avg episode reward: [(0, '29.390')] [2024-09-01 16:58:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 6275072. Throughput: 0: 228.9. Samples: 66208. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:58:19,204][00194] Avg episode reward: [(0, '29.800')] [2024-09-01 16:58:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 6279168. Throughput: 0: 224.3. Samples: 67300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:58:24,201][00194] Avg episode reward: [(0, '29.446')] [2024-09-01 16:58:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6283264. Throughput: 0: 244.1. Samples: 69298. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:58:29,210][00194] Avg episode reward: [(0, '29.792')] [2024-09-01 16:58:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6287360. Throughput: 0: 245.8. Samples: 70070. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:58:34,200][00194] Avg episode reward: [(0, '29.670')] [2024-09-01 16:58:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6291456. Throughput: 0: 233.6. Samples: 71164. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:58:39,202][00194] Avg episode reward: [(0, '29.929')] [2024-09-01 16:58:44,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 6299648. Throughput: 0: 229.8. Samples: 72362. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:58:44,206][00194] Avg episode reward: [(0, '30.028')] [2024-09-01 16:58:48,277][37549] Updated weights for policy 0, policy_version 1539 (0.1468) [2024-09-01 16:58:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6303744. Throughput: 0: 239.4. Samples: 73358. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:58:49,204][00194] Avg episode reward: [(0, '29.766')] [2024-09-01 16:58:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6307840. Throughput: 0: 232.8. Samples: 74572. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:58:54,201][00194] Avg episode reward: [(0, '29.728')] [2024-09-01 16:58:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6311936. Throughput: 0: 223.9. Samples: 75846. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:58:59,202][00194] Avg episode reward: [(0, '29.369')] [2024-09-01 16:59:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6316032. Throughput: 0: 230.8. Samples: 76596. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:59:04,201][00194] Avg episode reward: [(0, '29.093')] [2024-09-01 16:59:05,639][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001543_6320128.pth... [2024-09-01 16:59:05,744][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth [2024-09-01 16:59:09,199][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6320128. Throughput: 0: 249.8. Samples: 78542. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:59:09,201][00194] Avg episode reward: [(0, '28.602')] [2024-09-01 16:59:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6324224. Throughput: 0: 226.7. Samples: 79500. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:14,203][00194] Avg episode reward: [(0, '28.184')] [2024-09-01 16:59:19,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6328320. Throughput: 0: 221.5. Samples: 80036. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:19,206][00194] Avg episode reward: [(0, '27.939')] [2024-09-01 16:59:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 944.2). Total num frames: 6336512. Throughput: 0: 231.0. Samples: 81558. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:24,200][00194] Avg episode reward: [(0, '27.812')] [2024-09-01 16:59:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6340608. Throughput: 0: 227.7. Samples: 82610. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:29,200][00194] Avg episode reward: [(0, '27.393')] [2024-09-01 16:59:33,527][37549] Updated weights for policy 0, policy_version 1549 (0.0056) [2024-09-01 16:59:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6344704. Throughput: 0: 226.7. Samples: 83560. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:34,201][00194] Avg episode reward: [(0, '27.908')] [2024-09-01 16:59:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6348800. Throughput: 0: 223.5. Samples: 84630. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:39,201][00194] Avg episode reward: [(0, '27.908')] [2024-09-01 16:59:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6352896. Throughput: 0: 238.7. Samples: 86586. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:59:44,201][00194] Avg episode reward: [(0, '27.568')] [2024-09-01 16:59:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6356992. Throughput: 0: 230.0. Samples: 86946. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:59:49,211][00194] Avg episode reward: [(0, '27.675')] [2024-09-01 16:59:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6361088. Throughput: 0: 213.0. Samples: 88126. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:59:54,201][00194] Avg episode reward: [(0, '27.472')] [2024-09-01 16:59:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 944.2). Total num frames: 6369280. Throughput: 0: 211.4. Samples: 89012. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:59:59,207][00194] Avg episode reward: [(0, '26.135')] [2024-09-01 17:00:04,200][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6373376. Throughput: 0: 235.7. Samples: 90642. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:04,209][00194] Avg episode reward: [(0, '27.509')] [2024-09-01 17:00:09,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6373376. Throughput: 0: 226.1. Samples: 91734. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:09,201][00194] Avg episode reward: [(0, '27.167')] [2024-09-01 17:00:14,198][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6381568. Throughput: 0: 228.0. Samples: 92868. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:14,200][00194] Avg episode reward: [(0, '27.166')] [2024-09-01 17:00:17,307][37549] Updated weights for policy 0, policy_version 1559 (0.2512) [2024-09-01 17:00:19,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6385664. Throughput: 0: 228.6. Samples: 93846. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:19,202][00194] Avg episode reward: [(0, '27.823')] [2024-09-01 17:00:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6389760. Throughput: 0: 234.0. Samples: 95160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:24,201][00194] Avg episode reward: [(0, '27.622')] [2024-09-01 17:00:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6393856. Throughput: 0: 218.0. Samples: 96394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:29,208][00194] Avg episode reward: [(0, '27.291')] [2024-09-01 17:00:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6397952. Throughput: 0: 230.8. Samples: 97334. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:34,207][00194] Avg episode reward: [(0, '27.292')] [2024-09-01 17:00:39,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6406144. Throughput: 0: 237.8. Samples: 98828. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:39,201][00194] Avg episode reward: [(0, '27.098')] [2024-09-01 17:00:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6406144. Throughput: 0: 240.8. Samples: 99848. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:00:44,201][00194] Avg episode reward: [(0, '27.058')] [2024-09-01 17:00:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6414336. Throughput: 0: 226.8. Samples: 100848. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:00:49,201][00194] Avg episode reward: [(0, '26.901')] [2024-09-01 17:00:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6418432. Throughput: 0: 234.0. Samples: 102264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:00:54,205][00194] Avg episode reward: [(0, '27.528')] [2024-09-01 17:00:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6422528. Throughput: 0: 225.0. Samples: 102992. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:00:59,203][00194] Avg episode reward: [(0, '27.417')] [2024-09-01 17:01:03,121][37549] Updated weights for policy 0, policy_version 1569 (0.1692) [2024-09-01 17:01:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6426624. Throughput: 0: 225.3. Samples: 103986. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:01:04,208][00194] Avg episode reward: [(0, '27.863')] [2024-09-01 17:01:05,440][37536] Signal inference workers to stop experience collection... (100 times) [2024-09-01 17:01:05,493][37549] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 17:01:06,406][37536] Signal inference workers to resume experience collection... (100 times) [2024-09-01 17:01:06,407][37549] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 17:01:06,406][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001570_6430720.pth... [2024-09-01 17:01:06,518][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001516_6209536.pth [2024-09-01 17:01:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6430720. Throughput: 0: 235.1. Samples: 105740. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:01:09,209][00194] Avg episode reward: [(0, '27.684')] [2024-09-01 17:01:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6434816. Throughput: 0: 235.3. Samples: 106982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:01:14,201][00194] Avg episode reward: [(0, '27.647')] [2024-09-01 17:01:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6438912. Throughput: 0: 227.7. Samples: 107580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:01:19,201][00194] Avg episode reward: [(0, '27.699')] [2024-09-01 17:01:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6447104. Throughput: 0: 228.0. Samples: 109086. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:01:24,206][00194] Avg episode reward: [(0, '27.621')] [2024-09-01 17:01:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6451200. Throughput: 0: 236.4. Samples: 110486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:01:29,206][00194] Avg episode reward: [(0, '27.615')] [2024-09-01 17:01:34,202][00194] Fps is (10 sec: 818.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6455296. Throughput: 0: 227.2. Samples: 111074. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:01:34,207][00194] Avg episode reward: [(0, '27.789')] [2024-09-01 17:01:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6459392. Throughput: 0: 220.8. Samples: 112200. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:01:39,202][00194] Avg episode reward: [(0, '27.868')] [2024-09-01 17:01:44,198][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6463488. Throughput: 0: 245.9. Samples: 114056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:01:44,204][00194] Avg episode reward: [(0, '27.476')] [2024-09-01 17:01:45,013][37549] Updated weights for policy 0, policy_version 1579 (0.1680) [2024-09-01 17:01:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 944.2). Total num frames: 6471680. Throughput: 0: 244.4. Samples: 114986. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:01:49,206][00194] Avg episode reward: [(0, '27.617')] [2024-09-01 17:01:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6471680. Throughput: 0: 231.0. Samples: 116136. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:01:54,203][00194] Avg episode reward: [(0, '27.766')] [2024-09-01 17:01:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6479872. Throughput: 0: 229.3. Samples: 117300. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:01:59,204][00194] Avg episode reward: [(0, '26.220')] [2024-09-01 17:02:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6483968. Throughput: 0: 238.1. Samples: 118296. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:02:04,200][00194] Avg episode reward: [(0, '25.472')] [2024-09-01 17:02:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6488064. Throughput: 0: 234.3. Samples: 119628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:09,204][00194] Avg episode reward: [(0, '25.817')] [2024-09-01 17:02:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6492160. Throughput: 0: 227.7. Samples: 120732. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:14,203][00194] Avg episode reward: [(0, '25.376')] [2024-09-01 17:02:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6496256. Throughput: 0: 231.5. Samples: 121490. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:19,200][00194] Avg episode reward: [(0, '24.888')] [2024-09-01 17:02:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6500352. Throughput: 0: 248.8. Samples: 123396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:24,201][00194] Avg episode reward: [(0, '24.905')] [2024-09-01 17:02:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6504448. Throughput: 0: 230.5. Samples: 124428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:02:29,200][00194] Avg episode reward: [(0, '25.465')] [2024-09-01 17:02:30,447][37549] Updated weights for policy 0, policy_version 1589 (0.0056) [2024-09-01 17:02:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6508544. Throughput: 0: 220.5. Samples: 124910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:02:34,200][00194] Avg episode reward: [(0, '26.609')] [2024-09-01 17:02:39,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6516736. Throughput: 0: 229.5. Samples: 126462. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:39,201][00194] Avg episode reward: [(0, '25.983')] [2024-09-01 17:02:44,200][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6520832. Throughput: 0: 226.7. Samples: 127504. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:44,203][00194] Avg episode reward: [(0, '25.790')] [2024-09-01 17:02:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6524928. Throughput: 0: 227.6. Samples: 128536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:49,206][00194] Avg episode reward: [(0, '24.858')] [2024-09-01 17:02:54,198][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6529024. Throughput: 0: 227.2. Samples: 129854. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:54,200][00194] Avg episode reward: [(0, '24.719')] [2024-09-01 17:02:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6533120. Throughput: 0: 240.2. Samples: 131540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:02:59,208][00194] Avg episode reward: [(0, '25.157')] [2024-09-01 17:03:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6537216. Throughput: 0: 237.3. Samples: 132170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:03:04,202][00194] Avg episode reward: [(0, '24.973')] [2024-09-01 17:03:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6541312. Throughput: 0: 220.0. Samples: 133298. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:03:09,203][00194] Avg episode reward: [(0, '24.691')] [2024-09-01 17:03:10,064][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth... [2024-09-01 17:03:10,154][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001543_6320128.pth [2024-09-01 17:03:13,832][37549] Updated weights for policy 0, policy_version 1599 (0.1152) [2024-09-01 17:03:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6549504. Throughput: 0: 231.4. Samples: 134842. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:03:14,200][00194] Avg episode reward: [(0, '24.283')] [2024-09-01 17:03:19,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6553600. Throughput: 0: 241.2. Samples: 135762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:03:19,204][00194] Avg episode reward: [(0, '24.639')] [2024-09-01 17:03:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6557696. Throughput: 0: 229.5. Samples: 136790. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:03:24,205][00194] Avg episode reward: [(0, '24.639')] [2024-09-01 17:03:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6561792. Throughput: 0: 239.9. Samples: 138300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:03:29,205][00194] Avg episode reward: [(0, '25.056')] [2024-09-01 17:03:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6565888. Throughput: 0: 229.9. Samples: 138882. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:03:34,205][00194] Avg episode reward: [(0, '24.925')] [2024-09-01 17:03:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6569984. Throughput: 0: 241.0. Samples: 140700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:03:39,201][00194] Avg episode reward: [(0, '25.620')] [2024-09-01 17:03:44,203][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 6574080. Throughput: 0: 228.2. Samples: 141808. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:03:44,205][00194] Avg episode reward: [(0, '25.936')] [2024-09-01 17:03:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6578176. Throughput: 0: 233.2. Samples: 142664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:03:49,204][00194] Avg episode reward: [(0, '25.742')] [2024-09-01 17:03:54,200][00194] Fps is (10 sec: 1229.3, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6586368. Throughput: 0: 240.6. Samples: 144124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:03:54,204][00194] Avg episode reward: [(0, '25.189')] [2024-09-01 17:03:58,144][37549] Updated weights for policy 0, policy_version 1609 (0.0487) [2024-09-01 17:03:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6590464. Throughput: 0: 226.0. Samples: 145014. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:03:59,202][00194] Avg episode reward: [(0, '26.359')] [2024-09-01 17:04:04,198][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6594560. Throughput: 0: 228.3. Samples: 146036. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:04,201][00194] Avg episode reward: [(0, '26.539')] [2024-09-01 17:04:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6598656. Throughput: 0: 241.6. Samples: 147660. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:09,202][00194] Avg episode reward: [(0, '27.376')] [2024-09-01 17:04:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6602752. Throughput: 0: 241.1. Samples: 149148. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:14,200][00194] Avg episode reward: [(0, '27.918')] [2024-09-01 17:04:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6606848. Throughput: 0: 236.0. Samples: 149502. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:19,200][00194] Avg episode reward: [(0, '27.779')] [2024-09-01 17:04:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6610944. Throughput: 0: 227.6. Samples: 150944. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:24,207][00194] Avg episode reward: [(0, '27.152')] [2024-09-01 17:04:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6619136. Throughput: 0: 240.8. Samples: 152644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:04:29,201][00194] Avg episode reward: [(0, '26.917')] [2024-09-01 17:04:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6623232. Throughput: 0: 236.0. Samples: 153286. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:04:34,201][00194] Avg episode reward: [(0, '26.917')] [2024-09-01 17:04:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6627328. Throughput: 0: 224.9. Samples: 154244. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:39,206][00194] Avg episode reward: [(0, '27.449')] [2024-09-01 17:04:42,634][37549] Updated weights for policy 0, policy_version 1619 (0.1409) [2024-09-01 17:04:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 6631424. Throughput: 0: 244.0. Samples: 155992. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:04:44,200][00194] Avg episode reward: [(0, '26.953')] [2024-09-01 17:04:44,979][37536] Signal inference workers to stop experience collection... (150 times) [2024-09-01 17:04:45,049][37549] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 17:04:45,940][37536] Signal inference workers to resume experience collection... (150 times) [2024-09-01 17:04:45,941][37549] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 17:04:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 6635520. Throughput: 0: 236.7. Samples: 156686. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:04:49,206][00194] Avg episode reward: [(0, '26.836')] [2024-09-01 17:04:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6639616. Throughput: 0: 225.5. Samples: 157808. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:04:54,205][00194] Avg episode reward: [(0, '27.435')] [2024-09-01 17:04:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6643712. Throughput: 0: 204.8. Samples: 158364. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:04:59,202][00194] Avg episode reward: [(0, '27.267')] [2024-09-01 17:05:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 6647808. Throughput: 0: 221.2. Samples: 159454. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:05:04,205][00194] Avg episode reward: [(0, '27.540')] [2024-09-01 17:05:09,206][00194] Fps is (10 sec: 409.3, 60 sec: 819.1, 300 sec: 902.5). Total num frames: 6647808. Throughput: 0: 210.1. Samples: 160402. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:05:09,211][00194] Avg episode reward: [(0, '27.960')] [2024-09-01 17:05:10,310][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001624_6651904.pth... [2024-09-01 17:05:10,403][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001570_6430720.pth [2024-09-01 17:05:14,198][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6651904. Throughput: 0: 180.2. Samples: 160754. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:05:14,206][00194] Avg episode reward: [(0, '28.224')] [2024-09-01 17:05:19,198][00194] Fps is (10 sec: 819.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6656000. Throughput: 0: 190.5. Samples: 161858. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:05:19,210][00194] Avg episode reward: [(0, '29.049')] [2024-09-01 17:05:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6664192. Throughput: 0: 208.2. Samples: 163614. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 17:05:24,207][00194] Avg episode reward: [(0, '29.408')] [2024-09-01 17:05:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 916.4). Total num frames: 6668288. Throughput: 0: 182.3. Samples: 164194. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 17:05:29,202][00194] Avg episode reward: [(0, '29.965')] [2024-09-01 17:05:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6672384. Throughput: 0: 193.0. Samples: 165370. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 17:05:34,200][00194] Avg episode reward: [(0, '29.900')] [2024-09-01 17:05:34,657][37549] Updated weights for policy 0, policy_version 1629 (0.2727) [2024-09-01 17:05:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 916.4). Total num frames: 6676480. Throughput: 0: 195.3. Samples: 166596. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 17:05:39,203][00194] Avg episode reward: [(0, '29.789')] [2024-09-01 17:05:44,200][00194] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6680576. Throughput: 0: 219.5. Samples: 168242. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 17:05:44,212][00194] Avg episode reward: [(0, '29.686')] [2024-09-01 17:05:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6684672. Throughput: 0: 205.1. Samples: 168682. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:05:49,200][00194] Avg episode reward: [(0, '29.294')] [2024-09-01 17:05:54,198][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6688768. Throughput: 0: 211.6. Samples: 169924. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:05:54,201][00194] Avg episode reward: [(0, '29.531')] [2024-09-01 17:05:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6692864. Throughput: 0: 243.2. Samples: 171698. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:05:59,206][00194] Avg episode reward: [(0, '29.262')] [2024-09-01 17:06:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 6696960. Throughput: 0: 232.5. Samples: 172320. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:04,201][00194] Avg episode reward: [(0, '29.011')] [2024-09-01 17:06:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 6701056. Throughput: 0: 215.2. Samples: 173296. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:09,201][00194] Avg episode reward: [(0, '29.111')] [2024-09-01 17:06:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 6709248. Throughput: 0: 236.3. Samples: 174828. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:06:14,203][00194] Avg episode reward: [(0, '29.603')] [2024-09-01 17:06:17,835][37549] Updated weights for policy 0, policy_version 1639 (0.3126) [2024-09-01 17:06:19,199][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6713344. Throughput: 0: 232.0. Samples: 175810. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:06:19,202][00194] Avg episode reward: [(0, '29.591')] [2024-09-01 17:06:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6717440. Throughput: 0: 228.1. Samples: 176860. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:06:24,204][00194] Avg episode reward: [(0, '29.287')] [2024-09-01 17:06:29,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6721536. Throughput: 0: 221.6. Samples: 178212. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:06:29,206][00194] Avg episode reward: [(0, '28.904')] [2024-09-01 17:06:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6725632. Throughput: 0: 227.3. Samples: 178910. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:34,200][00194] Avg episode reward: [(0, '30.199')] [2024-09-01 17:06:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6729728. Throughput: 0: 238.2. Samples: 180644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:39,204][00194] Avg episode reward: [(0, '30.178')] [2024-09-01 17:06:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6733824. Throughput: 0: 219.7. Samples: 181584. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:44,201][00194] Avg episode reward: [(0, '30.172')] [2024-09-01 17:06:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6737920. Throughput: 0: 216.5. Samples: 182062. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:49,208][00194] Avg episode reward: [(0, '30.673')] [2024-09-01 17:06:54,202][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6746112. Throughput: 0: 234.2. Samples: 183836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:06:54,211][00194] Avg episode reward: [(0, '30.995')] [2024-09-01 17:06:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6750208. Throughput: 0: 212.1. Samples: 184372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:06:59,207][00194] Avg episode reward: [(0, '30.539')] [2024-09-01 17:07:04,198][00194] Fps is (10 sec: 409.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6750208. Throughput: 0: 216.5. Samples: 185552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:04,207][00194] Avg episode reward: [(0, '29.625')] [2024-09-01 17:07:05,019][37549] Updated weights for policy 0, policy_version 1649 (0.1978) [2024-09-01 17:07:08,288][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001650_6758400.pth... [2024-09-01 17:07:08,391][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth [2024-09-01 17:07:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6758400. Throughput: 0: 223.8. Samples: 186930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:09,201][00194] Avg episode reward: [(0, '30.139')] [2024-09-01 17:07:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6762496. Throughput: 0: 216.4. Samples: 187952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:14,205][00194] Avg episode reward: [(0, '30.144')] [2024-09-01 17:07:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6766592. Throughput: 0: 223.5. Samples: 188966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:19,205][00194] Avg episode reward: [(0, '31.555')] [2024-09-01 17:07:24,210][00194] Fps is (10 sec: 818.2, 60 sec: 887.3, 300 sec: 902.5). Total num frames: 6770688. Throughput: 0: 210.1. Samples: 190100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:24,214][00194] Avg episode reward: [(0, '30.649')] [2024-09-01 17:07:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6774784. Throughput: 0: 231.9. Samples: 192018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:29,202][00194] Avg episode reward: [(0, '30.857')] [2024-09-01 17:07:34,198][00194] Fps is (10 sec: 820.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6778880. Throughput: 0: 239.6. Samples: 192844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:34,202][00194] Avg episode reward: [(0, '30.462')] [2024-09-01 17:07:39,199][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6782976. Throughput: 0: 221.7. Samples: 193810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:39,208][00194] Avg episode reward: [(0, '30.377')] [2024-09-01 17:07:44,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6791168. Throughput: 0: 238.9. Samples: 195124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:07:44,206][00194] Avg episode reward: [(0, '30.595')] [2024-09-01 17:07:48,197][37549] Updated weights for policy 0, policy_version 1659 (0.1941) [2024-09-01 17:07:49,200][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6795264. Throughput: 0: 232.6. Samples: 196020. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:07:49,205][00194] Avg episode reward: [(0, '30.095')] [2024-09-01 17:07:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6799360. Throughput: 0: 227.5. Samples: 197168. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:54,203][00194] Avg episode reward: [(0, '29.899')] [2024-09-01 17:07:59,198][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6803456. Throughput: 0: 233.5. Samples: 198458. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:07:59,202][00194] Avg episode reward: [(0, '30.231')] [2024-09-01 17:08:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6807552. Throughput: 0: 225.2. Samples: 199100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:04,201][00194] Avg episode reward: [(0, '30.164')] [2024-09-01 17:08:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6811648. Throughput: 0: 241.3. Samples: 200956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:09,201][00194] Avg episode reward: [(0, '30.023')] [2024-09-01 17:08:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6815744. Throughput: 0: 220.0. Samples: 201916. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:08:14,201][00194] Avg episode reward: [(0, '29.978')] [2024-09-01 17:08:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6819840. Throughput: 0: 216.2. Samples: 202572. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:08:19,201][00194] Avg episode reward: [(0, '29.942')] [2024-09-01 17:08:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.9, 300 sec: 902.5). Total num frames: 6828032. Throughput: 0: 232.8. Samples: 204286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:08:24,205][00194] Avg episode reward: [(0, '29.932')] [2024-09-01 17:08:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6832128. Throughput: 0: 222.4. Samples: 205130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:29,203][00194] Avg episode reward: [(0, '28.888')] [2024-09-01 17:08:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6836224. Throughput: 0: 220.5. Samples: 205944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:34,200][00194] Avg episode reward: [(0, '28.760')] [2024-09-01 17:08:34,570][37549] Updated weights for policy 0, policy_version 1669 (0.1704) [2024-09-01 17:08:36,813][37536] Signal inference workers to stop experience collection... (200 times) [2024-09-01 17:08:36,869][37549] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 17:08:37,836][37536] Signal inference workers to resume experience collection... (200 times) [2024-09-01 17:08:37,841][37549] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 17:08:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6840320. Throughput: 0: 227.7. Samples: 207416. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:39,210][00194] Avg episode reward: [(0, '29.215')] [2024-09-01 17:08:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6844416. Throughput: 0: 220.9. Samples: 208400. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:44,206][00194] Avg episode reward: [(0, '28.128')] [2024-09-01 17:08:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6848512. Throughput: 0: 230.5. Samples: 209472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:08:49,204][00194] Avg episode reward: [(0, '28.509')] [2024-09-01 17:08:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6852608. Throughput: 0: 219.1. Samples: 210814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:08:54,201][00194] Avg episode reward: [(0, '28.472')] [2024-09-01 17:08:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6856704. Throughput: 0: 235.8. Samples: 212526. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:08:59,206][00194] Avg episode reward: [(0, '28.559')] [2024-09-01 17:09:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6860800. Throughput: 0: 235.1. Samples: 213150. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:04,201][00194] Avg episode reward: [(0, '28.391')] [2024-09-01 17:09:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6864896. Throughput: 0: 219.2. Samples: 214148. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:09:09,204][00194] Avg episode reward: [(0, '28.317')] [2024-09-01 17:09:09,967][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001677_6868992.pth... [2024-09-01 17:09:10,089][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001624_6651904.pth [2024-09-01 17:09:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6873088. Throughput: 0: 235.9. Samples: 215744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:14,200][00194] Avg episode reward: [(0, '27.484')] [2024-09-01 17:09:17,987][37549] Updated weights for policy 0, policy_version 1679 (0.3795) [2024-09-01 17:09:19,199][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6877184. Throughput: 0: 236.5. Samples: 216586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:19,202][00194] Avg episode reward: [(0, '27.939')] [2024-09-01 17:09:24,206][00194] Fps is (10 sec: 818.5, 60 sec: 887.3, 300 sec: 888.6). Total num frames: 6881280. Throughput: 0: 228.3. Samples: 217690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:24,218][00194] Avg episode reward: [(0, '27.466')] [2024-09-01 17:09:29,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6885376. Throughput: 0: 236.9. Samples: 219060. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:29,200][00194] Avg episode reward: [(0, '27.202')] [2024-09-01 17:09:34,198][00194] Fps is (10 sec: 819.9, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6889472. Throughput: 0: 228.9. Samples: 219772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:34,204][00194] Avg episode reward: [(0, '26.968')] [2024-09-01 17:09:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6893568. Throughput: 0: 238.7. Samples: 221554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:39,202][00194] Avg episode reward: [(0, '27.132')] [2024-09-01 17:09:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6897664. Throughput: 0: 206.0. Samples: 221798. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:44,206][00194] Avg episode reward: [(0, '27.380')] [2024-09-01 17:09:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6901760. Throughput: 0: 218.5. Samples: 222982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:49,200][00194] Avg episode reward: [(0, '27.201')] [2024-09-01 17:09:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 6909952. Throughput: 0: 236.1. Samples: 224772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:54,206][00194] Avg episode reward: [(0, '26.973')] [2024-09-01 17:09:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 6909952. Throughput: 0: 222.4. Samples: 225752. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:09:59,205][00194] Avg episode reward: [(0, '26.738')] [2024-09-01 17:10:04,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6914048. Throughput: 0: 214.8. Samples: 226252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:10:04,203][00194] Avg episode reward: [(0, '26.787')] [2024-09-01 17:10:05,080][37549] Updated weights for policy 0, policy_version 1689 (0.2219) [2024-09-01 17:10:09,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 6922240. Throughput: 0: 224.0. Samples: 227770. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:10:09,201][00194] Avg episode reward: [(0, '27.297')] [2024-09-01 17:10:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6926336. Throughput: 0: 226.3. Samples: 229242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:10:14,205][00194] Avg episode reward: [(0, '26.828')] [2024-09-01 17:10:19,205][00194] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 6930432. Throughput: 0: 222.9. Samples: 229802. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:10:19,208][00194] Avg episode reward: [(0, '27.229')] [2024-09-01 17:10:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 6934528. Throughput: 0: 211.4. Samples: 231066. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:10:24,201][00194] Avg episode reward: [(0, '26.704')] [2024-09-01 17:10:29,198][00194] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6938624. Throughput: 0: 243.8. Samples: 232770. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:10:29,206][00194] Avg episode reward: [(0, '26.835')] [2024-09-01 17:10:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6942720. Throughput: 0: 230.1. Samples: 233338. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:10:34,201][00194] Avg episode reward: [(0, '27.032')] [2024-09-01 17:10:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6946816. Throughput: 0: 215.4. Samples: 234466. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:10:39,204][00194] Avg episode reward: [(0, '27.675')] [2024-09-01 17:10:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6950912. Throughput: 0: 226.3. Samples: 235934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:10:44,207][00194] Avg episode reward: [(0, '27.508')] [2024-09-01 17:10:49,071][37549] Updated weights for policy 0, policy_version 1699 (0.3114) [2024-09-01 17:10:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 6959104. Throughput: 0: 233.8. Samples: 236772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:10:49,205][00194] Avg episode reward: [(0, '27.149')] [2024-09-01 17:10:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 6963200. Throughput: 0: 227.2. Samples: 237996. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:10:54,203][00194] Avg episode reward: [(0, '26.227')] [2024-09-01 17:10:59,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6963200. Throughput: 0: 217.5. Samples: 239028. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:10:59,208][00194] Avg episode reward: [(0, '26.310')] [2024-09-01 17:11:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 6971392. Throughput: 0: 226.2. Samples: 239980. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:04,206][00194] Avg episode reward: [(0, '25.549')] [2024-09-01 17:11:06,823][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001703_6975488.pth... [2024-09-01 17:11:06,936][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001650_6758400.pth [2024-09-01 17:11:09,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6975488. Throughput: 0: 233.1. Samples: 241556. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:09,202][00194] Avg episode reward: [(0, '26.168')] [2024-09-01 17:11:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6979584. Throughput: 0: 206.8. Samples: 242076. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:14,205][00194] Avg episode reward: [(0, '26.702')] [2024-09-01 17:11:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 6983680. Throughput: 0: 217.6. Samples: 243132. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:19,204][00194] Avg episode reward: [(0, '25.922')] [2024-09-01 17:11:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6987776. Throughput: 0: 232.0. Samples: 244904. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:11:24,200][00194] Avg episode reward: [(0, '25.696')] [2024-09-01 17:11:29,208][00194] Fps is (10 sec: 818.4, 60 sec: 887.3, 300 sec: 902.5). Total num frames: 6991872. Throughput: 0: 227.1. Samples: 246156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:11:29,214][00194] Avg episode reward: [(0, '25.343')] [2024-09-01 17:11:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 6995968. Throughput: 0: 219.0. Samples: 246626. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:34,206][00194] Avg episode reward: [(0, '25.408')] [2024-09-01 17:11:35,987][37549] Updated weights for policy 0, policy_version 1709 (0.2563) [2024-09-01 17:11:39,198][00194] Fps is (10 sec: 820.0, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7000064. Throughput: 0: 226.0. Samples: 248168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:39,208][00194] Avg episode reward: [(0, '25.012')] [2024-09-01 17:11:44,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7008256. Throughput: 0: 222.1. Samples: 249024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:44,203][00194] Avg episode reward: [(0, '24.534')] [2024-09-01 17:11:49,200][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7012352. Throughput: 0: 229.2. Samples: 250294. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:11:49,207][00194] Avg episode reward: [(0, '24.615')] [2024-09-01 17:11:54,198][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 7012352. Throughput: 0: 212.7. Samples: 251128. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:11:54,201][00194] Avg episode reward: [(0, '25.474')] [2024-09-01 17:11:59,198][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7020544. Throughput: 0: 231.8. Samples: 252508. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:11:59,201][00194] Avg episode reward: [(0, '26.557')] [2024-09-01 17:12:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7024640. Throughput: 0: 226.7. Samples: 253332. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:04,205][00194] Avg episode reward: [(0, '26.792')] [2024-09-01 17:12:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7028736. Throughput: 0: 210.3. Samples: 254368. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:12:09,201][00194] Avg episode reward: [(0, '27.375')] [2024-09-01 17:12:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7032832. Throughput: 0: 214.2. Samples: 255792. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:12:14,205][00194] Avg episode reward: [(0, '26.904')] [2024-09-01 17:12:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7036928. Throughput: 0: 218.0. Samples: 256438. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:19,205][00194] Avg episode reward: [(0, '27.813')] [2024-09-01 17:12:20,789][37549] Updated weights for policy 0, policy_version 1719 (0.1043) [2024-09-01 17:12:24,199][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7041024. Throughput: 0: 221.0. Samples: 258112. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:24,207][00194] Avg episode reward: [(0, '26.772')] [2024-09-01 17:12:24,436][37536] Signal inference workers to stop experience collection... (250 times) [2024-09-01 17:12:24,499][37549] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 17:12:26,178][37536] Signal inference workers to resume experience collection... (250 times) [2024-09-01 17:12:26,179][37549] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 17:12:29,201][00194] Fps is (10 sec: 819.0, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 7045120. Throughput: 0: 225.9. Samples: 259192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:29,207][00194] Avg episode reward: [(0, '27.050')] [2024-09-01 17:12:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7049216. Throughput: 0: 212.8. Samples: 259870. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:34,201][00194] Avg episode reward: [(0, '26.177')] [2024-09-01 17:12:39,198][00194] Fps is (10 sec: 1229.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7057408. Throughput: 0: 230.7. Samples: 261508. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:39,206][00194] Avg episode reward: [(0, '26.278')] [2024-09-01 17:12:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 7057408. Throughput: 0: 222.6. Samples: 262524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:12:44,200][00194] Avg episode reward: [(0, '26.386')] [2024-09-01 17:12:49,198][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 7061504. Throughput: 0: 219.1. Samples: 263192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:12:49,207][00194] Avg episode reward: [(0, '26.884')] [2024-09-01 17:12:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7069696. Throughput: 0: 228.2. Samples: 264638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:12:54,201][00194] Avg episode reward: [(0, '26.931')] [2024-09-01 17:12:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7073792. Throughput: 0: 219.7. Samples: 265678. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:12:59,200][00194] Avg episode reward: [(0, '26.965')] [2024-09-01 17:13:04,199][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7077888. Throughput: 0: 228.3. Samples: 266710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:13:04,202][00194] Avg episode reward: [(0, '27.047')] [2024-09-01 17:13:07,573][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001729_7081984.pth... [2024-09-01 17:13:07,578][37549] Updated weights for policy 0, policy_version 1729 (0.1506) [2024-09-01 17:13:07,689][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001677_6868992.pth [2024-09-01 17:13:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7081984. Throughput: 0: 218.6. Samples: 267948. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:13:09,204][00194] Avg episode reward: [(0, '27.520')] [2024-09-01 17:13:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7086080. Throughput: 0: 233.3. Samples: 269688. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:13:14,207][00194] Avg episode reward: [(0, '28.349')] [2024-09-01 17:13:19,200][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 7090176. Throughput: 0: 229.7. Samples: 270208. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:13:19,207][00194] Avg episode reward: [(0, '28.675')] [2024-09-01 17:13:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7094272. Throughput: 0: 218.4. Samples: 271336. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:13:24,200][00194] Avg episode reward: [(0, '28.584')] [2024-09-01 17:13:29,198][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7098368. Throughput: 0: 229.2. Samples: 272838. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:13:29,201][00194] Avg episode reward: [(0, '28.566')] [2024-09-01 17:13:34,203][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7106560. Throughput: 0: 234.5. Samples: 273744. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:13:34,211][00194] Avg episode reward: [(0, '29.516')] [2024-09-01 17:13:39,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7110656. Throughput: 0: 228.8. Samples: 274932. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:13:39,203][00194] Avg episode reward: [(0, '29.516')] [2024-09-01 17:13:44,198][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7114752. Throughput: 0: 234.2. Samples: 276218. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:13:44,207][00194] Avg episode reward: [(0, '29.383')] [2024-09-01 17:13:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7118848. Throughput: 0: 228.8. Samples: 277006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:13:49,201][00194] Avg episode reward: [(0, '30.007')] [2024-09-01 17:13:51,090][37549] Updated weights for policy 0, policy_version 1739 (0.2591) [2024-09-01 17:13:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7122944. Throughput: 0: 236.9. Samples: 278608. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:13:54,202][00194] Avg episode reward: [(0, '30.359')] [2024-09-01 17:13:59,200][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7127040. Throughput: 0: 208.5. Samples: 279070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:13:59,205][00194] Avg episode reward: [(0, '30.548')] [2024-09-01 17:14:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7131136. Throughput: 0: 224.5. Samples: 280308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:04,205][00194] Avg episode reward: [(0, '30.544')] [2024-09-01 17:14:09,198][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7135232. Throughput: 0: 240.4. Samples: 282156. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:09,209][00194] Avg episode reward: [(0, '29.616')] [2024-09-01 17:14:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7139328. Throughput: 0: 229.6. Samples: 283172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:14,201][00194] Avg episode reward: [(0, '29.761')] [2024-09-01 17:14:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7143424. Throughput: 0: 221.1. Samples: 283694. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:14:19,207][00194] Avg episode reward: [(0, '29.761')] [2024-09-01 17:14:24,200][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7151616. Throughput: 0: 227.5. Samples: 285172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:14:24,205][00194] Avg episode reward: [(0, '29.858')] [2024-09-01 17:14:29,206][00194] Fps is (10 sec: 1227.8, 60 sec: 955.6, 300 sec: 902.5). Total num frames: 7155712. Throughput: 0: 230.8. Samples: 286606. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:14:29,209][00194] Avg episode reward: [(0, '29.628')] [2024-09-01 17:14:34,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7159808. Throughput: 0: 227.0. Samples: 287222. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:34,206][00194] Avg episode reward: [(0, '29.987')] [2024-09-01 17:14:37,686][37549] Updated weights for policy 0, policy_version 1749 (0.0536) [2024-09-01 17:14:39,198][00194] Fps is (10 sec: 819.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7163904. Throughput: 0: 219.1. Samples: 288468. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:39,200][00194] Avg episode reward: [(0, '29.408')] [2024-09-01 17:14:44,199][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7168000. Throughput: 0: 251.2. Samples: 290372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:44,208][00194] Avg episode reward: [(0, '29.231')] [2024-09-01 17:14:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7172096. Throughput: 0: 235.3. Samples: 290898. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:49,204][00194] Avg episode reward: [(0, '29.996')] [2024-09-01 17:14:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7176192. Throughput: 0: 216.6. Samples: 291904. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:54,205][00194] Avg episode reward: [(0, '29.969')] [2024-09-01 17:14:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7180288. Throughput: 0: 229.0. Samples: 293478. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:14:59,205][00194] Avg episode reward: [(0, '30.926')] [2024-09-01 17:15:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7188480. Throughput: 0: 237.1. Samples: 294364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:15:04,205][00194] Avg episode reward: [(0, '30.845')] [2024-09-01 17:15:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7188480. Throughput: 0: 225.3. Samples: 295308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:15:09,200][00194] Avg episode reward: [(0, '30.778')] [2024-09-01 17:15:09,840][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001756_7192576.pth... [2024-09-01 17:15:10,001][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001703_6975488.pth [2024-09-01 17:15:14,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7192576. Throughput: 0: 220.9. Samples: 296544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:15:14,205][00194] Avg episode reward: [(0, '30.334')] [2024-09-01 17:15:19,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7200768. Throughput: 0: 228.0. Samples: 297482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:19,209][00194] Avg episode reward: [(0, '30.568')] [2024-09-01 17:15:22,083][37549] Updated weights for policy 0, policy_version 1759 (0.1701) [2024-09-01 17:15:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7204864. Throughput: 0: 230.0. Samples: 298818. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:24,200][00194] Avg episode reward: [(0, '30.491')] [2024-09-01 17:15:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 7208960. Throughput: 0: 203.9. Samples: 299546. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:15:29,205][00194] Avg episode reward: [(0, '30.572')] [2024-09-01 17:15:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7213056. Throughput: 0: 216.0. Samples: 300618. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:15:34,206][00194] Avg episode reward: [(0, '30.495')] [2024-09-01 17:15:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7217152. Throughput: 0: 229.3. Samples: 302224. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:39,210][00194] Avg episode reward: [(0, '30.400')] [2024-09-01 17:15:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7221248. Throughput: 0: 210.0. Samples: 302926. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:44,201][00194] Avg episode reward: [(0, '30.301')] [2024-09-01 17:15:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7225344. Throughput: 0: 213.9. Samples: 303988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:49,207][00194] Avg episode reward: [(0, '29.773')] [2024-09-01 17:15:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7229440. Throughput: 0: 231.0. Samples: 305704. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:54,205][00194] Avg episode reward: [(0, '29.933')] [2024-09-01 17:15:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7237632. Throughput: 0: 222.6. Samples: 306560. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:15:59,205][00194] Avg episode reward: [(0, '30.305')] [2024-09-01 17:16:04,200][00194] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7241728. Throughput: 0: 228.3. Samples: 307754. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:16:04,204][00194] Avg episode reward: [(0, '30.679')] [2024-09-01 17:16:09,151][37549] Updated weights for policy 0, policy_version 1769 (0.1959) [2024-09-01 17:16:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7245824. Throughput: 0: 222.6. Samples: 308834. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:16:09,205][00194] Avg episode reward: [(0, '29.983')] [2024-09-01 17:16:11,460][37536] Signal inference workers to stop experience collection... (300 times) [2024-09-01 17:16:11,506][37549] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 17:16:12,472][37536] Signal inference workers to resume experience collection... (300 times) [2024-09-01 17:16:12,477][37549] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 17:16:14,198][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7249920. Throughput: 0: 240.9. Samples: 310388. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:16:14,206][00194] Avg episode reward: [(0, '29.527')] [2024-09-01 17:16:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7254016. Throughput: 0: 229.2. Samples: 310934. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:16:19,207][00194] Avg episode reward: [(0, '29.874')] [2024-09-01 17:16:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7258112. Throughput: 0: 220.2. Samples: 312132. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:16:24,206][00194] Avg episode reward: [(0, '30.116')] [2024-09-01 17:16:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7262208. Throughput: 0: 244.0. Samples: 313906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:16:29,209][00194] Avg episode reward: [(0, '28.877')] [2024-09-01 17:16:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7270400. Throughput: 0: 236.8. Samples: 314644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:16:34,201][00194] Avg episode reward: [(0, '28.785')] [2024-09-01 17:16:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7270400. Throughput: 0: 225.6. Samples: 315858. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:16:39,200][00194] Avg episode reward: [(0, '29.152')] [2024-09-01 17:16:44,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7274496. Throughput: 0: 229.9. Samples: 316906. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:16:44,203][00194] Avg episode reward: [(0, '28.908')] [2024-09-01 17:16:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7282688. Throughput: 0: 225.4. Samples: 317896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:16:49,207][00194] Avg episode reward: [(0, '28.988')] [2024-09-01 17:16:52,579][37549] Updated weights for policy 0, policy_version 1779 (0.2172) [2024-09-01 17:16:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7286784. Throughput: 0: 227.9. Samples: 319088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:16:54,205][00194] Avg episode reward: [(0, '29.664')] [2024-09-01 17:16:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7290880. Throughput: 0: 209.0. Samples: 319792. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:16:59,204][00194] Avg episode reward: [(0, '29.981')] [2024-09-01 17:17:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7294976. Throughput: 0: 224.9. Samples: 321054. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:17:04,201][00194] Avg episode reward: [(0, '29.568')] [2024-09-01 17:17:06,497][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001782_7299072.pth... [2024-09-01 17:17:06,607][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001729_7081984.pth [2024-09-01 17:17:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7299072. Throughput: 0: 236.9. Samples: 322794. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:17:09,201][00194] Avg episode reward: [(0, '28.672')] [2024-09-01 17:17:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7303168. Throughput: 0: 205.7. Samples: 323162. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:17:14,201][00194] Avg episode reward: [(0, '28.193')] [2024-09-01 17:17:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7307264. Throughput: 0: 213.7. Samples: 324262. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:19,200][00194] Avg episode reward: [(0, '28.788')] [2024-09-01 17:17:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7311360. Throughput: 0: 227.4. Samples: 326092. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:24,208][00194] Avg episode reward: [(0, '29.006')] [2024-09-01 17:17:29,202][00194] Fps is (10 sec: 1228.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7319552. Throughput: 0: 223.4. Samples: 326958. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:29,206][00194] Avg episode reward: [(0, '28.365')] [2024-09-01 17:17:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7323648. Throughput: 0: 228.4. Samples: 328172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:34,203][00194] Avg episode reward: [(0, '28.126')] [2024-09-01 17:17:38,542][37549] Updated weights for policy 0, policy_version 1789 (0.1821) [2024-09-01 17:17:39,198][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7327744. Throughput: 0: 224.4. Samples: 329188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:39,206][00194] Avg episode reward: [(0, '27.944')] [2024-09-01 17:17:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7331840. Throughput: 0: 248.8. Samples: 330988. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:44,205][00194] Avg episode reward: [(0, '28.004')] [2024-09-01 17:17:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7335936. Throughput: 0: 231.5. Samples: 331472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:17:49,207][00194] Avg episode reward: [(0, '28.236')] [2024-09-01 17:17:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7340032. Throughput: 0: 219.4. Samples: 332666. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:17:54,204][00194] Avg episode reward: [(0, '27.992')] [2024-09-01 17:17:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7344128. Throughput: 0: 245.3. Samples: 334202. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:17:59,202][00194] Avg episode reward: [(0, '27.754')] [2024-09-01 17:18:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7352320. Throughput: 0: 240.2. Samples: 335070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:04,206][00194] Avg episode reward: [(0, '27.866')] [2024-09-01 17:18:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7352320. Throughput: 0: 226.8. Samples: 336298. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:09,201][00194] Avg episode reward: [(0, '27.234')] [2024-09-01 17:18:14,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7356416. Throughput: 0: 230.5. Samples: 337330. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:18:14,208][00194] Avg episode reward: [(0, '26.876')] [2024-09-01 17:18:19,201][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7364608. Throughput: 0: 227.3. Samples: 338400. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:18:19,207][00194] Avg episode reward: [(0, '27.108')] [2024-09-01 17:18:22,093][37549] Updated weights for policy 0, policy_version 1799 (0.0515) [2024-09-01 17:18:24,203][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7368704. Throughput: 0: 237.7. Samples: 339886. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:18:24,206][00194] Avg episode reward: [(0, '27.137')] [2024-09-01 17:18:29,200][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7372800. Throughput: 0: 210.0. Samples: 340438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:29,203][00194] Avg episode reward: [(0, '26.826')] [2024-09-01 17:18:34,198][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7376896. Throughput: 0: 221.1. Samples: 341422. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:34,204][00194] Avg episode reward: [(0, '26.894')] [2024-09-01 17:18:39,198][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7380992. Throughput: 0: 237.7. Samples: 343362. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:39,201][00194] Avg episode reward: [(0, '27.610')] [2024-09-01 17:18:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7385088. Throughput: 0: 230.3. Samples: 344564. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:44,201][00194] Avg episode reward: [(0, '27.549')] [2024-09-01 17:18:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7389184. Throughput: 0: 220.8. Samples: 345004. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:49,207][00194] Avg episode reward: [(0, '27.341')] [2024-09-01 17:18:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7397376. Throughput: 0: 229.3. Samples: 346616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:54,206][00194] Avg episode reward: [(0, '26.974')] [2024-09-01 17:18:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7401472. Throughput: 0: 235.8. Samples: 347942. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:18:59,205][00194] Avg episode reward: [(0, '26.623')] [2024-09-01 17:19:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 7405568. Throughput: 0: 228.1. Samples: 348662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:04,205][00194] Avg episode reward: [(0, '26.623')] [2024-09-01 17:19:08,422][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001809_7409664.pth... [2024-09-01 17:19:08,426][37549] Updated weights for policy 0, policy_version 1809 (0.1478) [2024-09-01 17:19:08,536][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001756_7192576.pth [2024-09-01 17:19:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7409664. Throughput: 0: 217.9. Samples: 349690. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:09,206][00194] Avg episode reward: [(0, '26.946')] [2024-09-01 17:19:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7413760. Throughput: 0: 246.2. Samples: 351516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:14,200][00194] Avg episode reward: [(0, '26.895')] [2024-09-01 17:19:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7417856. Throughput: 0: 235.4. Samples: 352016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:19,203][00194] Avg episode reward: [(0, '27.032')] [2024-09-01 17:19:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7421952. Throughput: 0: 218.0. Samples: 353170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:24,203][00194] Avg episode reward: [(0, '27.681')] [2024-09-01 17:19:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7426048. Throughput: 0: 225.5. Samples: 354712. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:29,201][00194] Avg episode reward: [(0, '27.789')] [2024-09-01 17:19:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7434240. Throughput: 0: 231.5. Samples: 355422. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:19:34,200][00194] Avg episode reward: [(0, '27.641')] [2024-09-01 17:19:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7434240. Throughput: 0: 221.4. Samples: 356580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:19:39,201][00194] Avg episode reward: [(0, '27.798')] [2024-09-01 17:19:44,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7438336. Throughput: 0: 219.4. Samples: 357814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:44,203][00194] Avg episode reward: [(0, '27.194')] [2024-09-01 17:19:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7442432. Throughput: 0: 220.7. Samples: 358592. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:19:49,205][00194] Avg episode reward: [(0, '27.125')] [2024-09-01 17:19:53,469][37549] Updated weights for policy 0, policy_version 1819 (0.2912) [2024-09-01 17:19:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 7450624. Throughput: 0: 227.7. Samples: 359936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:19:54,205][00194] Avg episode reward: [(0, '27.561')] [2024-09-01 17:19:57,289][37536] Signal inference workers to stop experience collection... (350 times) [2024-09-01 17:19:57,392][37549] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 17:19:58,421][37536] Signal inference workers to resume experience collection... (350 times) [2024-09-01 17:19:58,423][37549] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 17:19:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7454720. Throughput: 0: 203.3. Samples: 360664. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:19:59,200][00194] Avg episode reward: [(0, '26.813')] [2024-09-01 17:20:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 7458816. Throughput: 0: 218.8. Samples: 361864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:04,201][00194] Avg episode reward: [(0, '26.448')] [2024-09-01 17:20:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 7462912. Throughput: 0: 223.6. Samples: 363232. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:09,201][00194] Avg episode reward: [(0, '26.848')] [2024-09-01 17:20:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7467008. Throughput: 0: 207.4. Samples: 364044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:14,203][00194] Avg episode reward: [(0, '26.986')] [2024-09-01 17:20:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7471104. Throughput: 0: 216.6. Samples: 365170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:19,203][00194] Avg episode reward: [(0, '26.656')] [2024-09-01 17:20:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7475200. Throughput: 0: 225.0. Samples: 366706. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:24,211][00194] Avg episode reward: [(0, '27.963')] [2024-09-01 17:20:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7479296. Throughput: 0: 229.2. Samples: 368130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:29,208][00194] Avg episode reward: [(0, '28.036')] [2024-09-01 17:20:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 7483392. Throughput: 0: 226.4. Samples: 368778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:34,200][00194] Avg episode reward: [(0, '27.967')] [2024-09-01 17:20:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7487488. Throughput: 0: 222.9. Samples: 369968. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:39,208][00194] Avg episode reward: [(0, '28.203')] [2024-09-01 17:20:40,012][37549] Updated weights for policy 0, policy_version 1829 (0.0533) [2024-09-01 17:20:44,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7495680. Throughput: 0: 222.4. Samples: 370670. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:44,204][00194] Avg episode reward: [(0, '28.685')] [2024-09-01 17:20:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7499776. Throughput: 0: 228.9. Samples: 372164. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:49,201][00194] Avg episode reward: [(0, '28.633')] [2024-09-01 17:20:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7503872. Throughput: 0: 223.8. Samples: 373304. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:54,205][00194] Avg episode reward: [(0, '28.460')] [2024-09-01 17:20:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7507968. Throughput: 0: 232.7. Samples: 374514. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:20:59,203][00194] Avg episode reward: [(0, '29.085')] [2024-09-01 17:21:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7512064. Throughput: 0: 224.6. Samples: 375278. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:04,201][00194] Avg episode reward: [(0, '28.068')] [2024-09-01 17:21:05,961][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001835_7516160.pth... [2024-09-01 17:21:06,070][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001782_7299072.pth [2024-09-01 17:21:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7516160. Throughput: 0: 230.3. Samples: 377068. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:09,204][00194] Avg episode reward: [(0, '29.207')] [2024-09-01 17:21:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7520256. Throughput: 0: 206.4. Samples: 377420. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:14,204][00194] Avg episode reward: [(0, '29.404')] [2024-09-01 17:21:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7524352. Throughput: 0: 221.2. Samples: 378734. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:19,211][00194] Avg episode reward: [(0, '28.917')] [2024-09-01 17:21:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7532544. Throughput: 0: 233.0. Samples: 380452. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:21:24,206][00194] Avg episode reward: [(0, '28.910')] [2024-09-01 17:21:24,377][37549] Updated weights for policy 0, policy_version 1839 (0.2461) [2024-09-01 17:21:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7536640. Throughput: 0: 238.2. Samples: 381388. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:21:29,201][00194] Avg episode reward: [(0, '28.910')] [2024-09-01 17:21:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7540736. Throughput: 0: 217.3. Samples: 381944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:21:34,206][00194] Avg episode reward: [(0, '28.002')] [2024-09-01 17:21:39,203][00194] Fps is (10 sec: 818.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7544832. Throughput: 0: 226.2. Samples: 383482. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:39,206][00194] Avg episode reward: [(0, '27.865')] [2024-09-01 17:21:44,203][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7548928. Throughput: 0: 231.7. Samples: 384940. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:44,214][00194] Avg episode reward: [(0, '27.830')] [2024-09-01 17:21:49,198][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7553024. Throughput: 0: 226.7. Samples: 385478. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:49,205][00194] Avg episode reward: [(0, '27.799')] [2024-09-01 17:21:54,198][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7557120. Throughput: 0: 218.0. Samples: 386876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:54,200][00194] Avg episode reward: [(0, '27.906')] [2024-09-01 17:21:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7561216. Throughput: 0: 247.4. Samples: 388552. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:21:59,206][00194] Avg episode reward: [(0, '27.268')] [2024-09-01 17:22:04,199][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7565312. Throughput: 0: 229.3. Samples: 389054. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:22:04,207][00194] Avg episode reward: [(0, '27.450')] [2024-09-01 17:22:09,205][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7569408. Throughput: 0: 217.2. Samples: 390226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:22:09,214][00194] Avg episode reward: [(0, '27.830')] [2024-09-01 17:22:11,093][37549] Updated weights for policy 0, policy_version 1849 (0.3563) [2024-09-01 17:22:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7573504. Throughput: 0: 227.4. Samples: 391622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:22:14,205][00194] Avg episode reward: [(0, '27.942')] [2024-09-01 17:22:19,198][00194] Fps is (10 sec: 1229.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7581696. Throughput: 0: 238.1. Samples: 392658. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:19,201][00194] Avg episode reward: [(0, '28.713')] [2024-09-01 17:22:24,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7585792. Throughput: 0: 228.5. Samples: 393764. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:22:24,213][00194] Avg episode reward: [(0, '28.713')] [2024-09-01 17:22:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7589888. Throughput: 0: 226.3. Samples: 395124. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:22:29,201][00194] Avg episode reward: [(0, '28.984')] [2024-09-01 17:22:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7593984. Throughput: 0: 229.8. Samples: 395820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:34,204][00194] Avg episode reward: [(0, '28.628')] [2024-09-01 17:22:39,199][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7598080. Throughput: 0: 231.9. Samples: 397314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:39,206][00194] Avg episode reward: [(0, '28.851')] [2024-09-01 17:22:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7602176. Throughput: 0: 206.0. Samples: 397820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:44,203][00194] Avg episode reward: [(0, '28.853')] [2024-09-01 17:22:49,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7606272. Throughput: 0: 224.4. Samples: 399154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:49,206][00194] Avg episode reward: [(0, '29.513')] [2024-09-01 17:22:54,069][37549] Updated weights for policy 0, policy_version 1859 (0.1671) [2024-09-01 17:22:54,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7614464. Throughput: 0: 238.7. Samples: 400964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:54,200][00194] Avg episode reward: [(0, '29.513')] [2024-09-01 17:22:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7614464. Throughput: 0: 230.3. Samples: 401986. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:22:59,201][00194] Avg episode reward: [(0, '29.392')] [2024-09-01 17:23:04,198][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7618560. Throughput: 0: 220.8. Samples: 402596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:23:04,201][00194] Avg episode reward: [(0, '29.575')] [2024-09-01 17:23:08,636][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001862_7626752.pth... [2024-09-01 17:23:08,752][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001809_7409664.pth [2024-09-01 17:23:09,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 7626752. Throughput: 0: 228.5. Samples: 404046. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:09,202][00194] Avg episode reward: [(0, '29.741')] [2024-09-01 17:23:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7630848. Throughput: 0: 221.3. Samples: 405084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:14,202][00194] Avg episode reward: [(0, '30.491')] [2024-09-01 17:23:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7634944. Throughput: 0: 227.3. Samples: 406050. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:19,201][00194] Avg episode reward: [(0, '30.177')] [2024-09-01 17:23:24,200][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7639040. Throughput: 0: 218.5. Samples: 407146. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:24,205][00194] Avg episode reward: [(0, '29.859')] [2024-09-01 17:23:29,199][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7643136. Throughput: 0: 252.6. Samples: 409188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:23:29,205][00194] Avg episode reward: [(0, '29.280')] [2024-09-01 17:23:34,199][00194] Fps is (10 sec: 819.3, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7647232. Throughput: 0: 228.2. Samples: 409424. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:23:34,202][00194] Avg episode reward: [(0, '29.287')] [2024-09-01 17:23:39,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7651328. Throughput: 0: 215.5. Samples: 410662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:23:39,200][00194] Avg episode reward: [(0, '28.806')] [2024-09-01 17:23:41,113][37549] Updated weights for policy 0, policy_version 1869 (0.1741) [2024-09-01 17:23:43,439][37536] Signal inference workers to stop experience collection... (400 times) [2024-09-01 17:23:43,494][37549] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-09-01 17:23:44,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7655424. Throughput: 0: 228.3. Samples: 412258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:23:44,207][00194] Avg episode reward: [(0, '28.693')] [2024-09-01 17:23:44,871][37536] Signal inference workers to resume experience collection... (400 times) [2024-09-01 17:23:44,871][37549] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-09-01 17:23:49,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7663616. Throughput: 0: 232.3. Samples: 413048. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:49,207][00194] Avg episode reward: [(0, '29.477')] [2024-09-01 17:23:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 7663616. Throughput: 0: 227.8. Samples: 414296. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:54,201][00194] Avg episode reward: [(0, '29.477')] [2024-09-01 17:23:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7671808. Throughput: 0: 215.0. Samples: 414760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:23:59,200][00194] Avg episode reward: [(0, '29.742')] [2024-09-01 17:24:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7675904. Throughput: 0: 226.7. Samples: 416250. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:24:04,203][00194] Avg episode reward: [(0, '29.901')] [2024-09-01 17:24:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7680000. Throughput: 0: 232.5. Samples: 417610. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:24:09,205][00194] Avg episode reward: [(0, '29.902')] [2024-09-01 17:24:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7684096. Throughput: 0: 209.9. Samples: 418632. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:24:14,204][00194] Avg episode reward: [(0, '30.793')] [2024-09-01 17:24:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7688192. Throughput: 0: 218.8. Samples: 419270. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:24:19,207][00194] Avg episode reward: [(0, '31.206')] [2024-09-01 17:24:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7692288. Throughput: 0: 235.3. Samples: 421250. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:24,209][00194] Avg episode reward: [(0, '31.948')] [2024-09-01 17:24:24,547][37549] Updated weights for policy 0, policy_version 1879 (0.1484) [2024-09-01 17:24:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7696384. Throughput: 0: 222.9. Samples: 422290. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:29,201][00194] Avg episode reward: [(0, '32.265')] [2024-09-01 17:24:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7700480. Throughput: 0: 216.8. Samples: 422806. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:34,207][00194] Avg episode reward: [(0, '32.316')] [2024-09-01 17:24:34,995][37536] Saving new best policy, reward=32.265! [2024-09-01 17:24:38,990][37536] Saving new best policy, reward=32.316! [2024-09-01 17:24:39,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7708672. Throughput: 0: 223.6. Samples: 424358. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:39,207][00194] Avg episode reward: [(0, '32.197')] [2024-09-01 17:24:44,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7712768. Throughput: 0: 232.9. Samples: 425240. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:44,204][00194] Avg episode reward: [(0, '31.975')] [2024-09-01 17:24:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7716864. Throughput: 0: 226.1. Samples: 426426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:49,204][00194] Avg episode reward: [(0, '31.570')] [2024-09-01 17:24:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7720960. Throughput: 0: 217.9. Samples: 427414. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:54,201][00194] Avg episode reward: [(0, '31.493')] [2024-09-01 17:24:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7725056. Throughput: 0: 234.2. Samples: 429170. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:24:59,209][00194] Avg episode reward: [(0, '30.829')] [2024-09-01 17:25:04,199][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7729152. Throughput: 0: 234.7. Samples: 429834. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:25:04,204][00194] Avg episode reward: [(0, '30.202')] [2024-09-01 17:25:06,802][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001888_7733248.pth... [2024-09-01 17:25:06,967][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001835_7516160.pth [2024-09-01 17:25:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7733248. Throughput: 0: 212.4. Samples: 430808. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:25:09,202][00194] Avg episode reward: [(0, '30.223')] [2024-09-01 17:25:11,728][37549] Updated weights for policy 0, policy_version 1889 (0.0525) [2024-09-01 17:25:14,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7737344. Throughput: 0: 227.9. Samples: 432544. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:25:14,208][00194] Avg episode reward: [(0, '29.535')] [2024-09-01 17:25:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7741440. Throughput: 0: 228.8. Samples: 433104. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:25:19,211][00194] Avg episode reward: [(0, '29.327')] [2024-09-01 17:25:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7745536. Throughput: 0: 228.8. Samples: 434656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:25:24,201][00194] Avg episode reward: [(0, '29.306')] [2024-09-01 17:25:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7749632. Throughput: 0: 233.5. Samples: 435746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:25:29,201][00194] Avg episode reward: [(0, '28.817')] [2024-09-01 17:25:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7757824. Throughput: 0: 225.8. Samples: 436586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:25:34,200][00194] Avg episode reward: [(0, '28.491')] [2024-09-01 17:25:39,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7761920. Throughput: 0: 236.1. Samples: 438040. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:25:39,203][00194] Avg episode reward: [(0, '28.144')] [2024-09-01 17:25:44,200][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7766016. Throughput: 0: 219.6. Samples: 439054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:25:44,202][00194] Avg episode reward: [(0, '28.662')] [2024-09-01 17:25:49,200][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7770112. Throughput: 0: 223.0. Samples: 439870. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:25:49,208][00194] Avg episode reward: [(0, '28.729')] [2024-09-01 17:25:54,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7774208. Throughput: 0: 233.5. Samples: 441316. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:25:54,206][00194] Avg episode reward: [(0, '29.106')] [2024-09-01 17:25:55,914][37549] Updated weights for policy 0, policy_version 1899 (0.2017) [2024-09-01 17:25:59,201][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7778304. Throughput: 0: 209.5. Samples: 441972. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:25:59,203][00194] Avg episode reward: [(0, '29.625')] [2024-09-01 17:26:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7782400. Throughput: 0: 224.3. Samples: 443198. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:04,204][00194] Avg episode reward: [(0, '29.178')] [2024-09-01 17:26:09,198][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7786496. Throughput: 0: 227.6. Samples: 444896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:09,207][00194] Avg episode reward: [(0, '30.286')] [2024-09-01 17:26:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7794688. Throughput: 0: 231.6. Samples: 446166. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:14,200][00194] Avg episode reward: [(0, '30.009')] [2024-09-01 17:26:19,200][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7798784. Throughput: 0: 229.0. Samples: 446892. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:19,205][00194] Avg episode reward: [(0, '30.304')] [2024-09-01 17:26:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7802880. Throughput: 0: 221.5. Samples: 448006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:24,206][00194] Avg episode reward: [(0, '29.718')] [2024-09-01 17:26:29,198][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7806976. Throughput: 0: 234.7. Samples: 449616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:29,206][00194] Avg episode reward: [(0, '29.618')] [2024-09-01 17:26:34,200][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7811072. Throughput: 0: 227.6. Samples: 450114. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:34,210][00194] Avg episode reward: [(0, '29.487')] [2024-09-01 17:26:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7815168. Throughput: 0: 222.7. Samples: 451336. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:39,205][00194] Avg episode reward: [(0, '29.073')] [2024-09-01 17:26:42,687][37549] Updated weights for policy 0, policy_version 1909 (0.0986) [2024-09-01 17:26:44,198][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7819264. Throughput: 0: 239.8. Samples: 452762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:44,202][00194] Avg episode reward: [(0, '29.132')] [2024-09-01 17:26:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7823360. Throughput: 0: 226.9. Samples: 453408. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:49,206][00194] Avg episode reward: [(0, '28.623')] [2024-09-01 17:26:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7827456. Throughput: 0: 223.5. Samples: 454954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:54,205][00194] Avg episode reward: [(0, '28.261')] [2024-09-01 17:26:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7831552. Throughput: 0: 220.0. Samples: 456064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:26:59,206][00194] Avg episode reward: [(0, '28.467')] [2024-09-01 17:27:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7835648. Throughput: 0: 222.5. Samples: 456904. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:04,210][00194] Avg episode reward: [(0, '28.194')] [2024-09-01 17:27:08,081][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001915_7843840.pth... [2024-09-01 17:27:08,189][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001862_7626752.pth [2024-09-01 17:27:09,201][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7843840. Throughput: 0: 228.4. Samples: 458284. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:09,204][00194] Avg episode reward: [(0, '28.159')] [2024-09-01 17:27:14,198][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7847936. Throughput: 0: 211.2. Samples: 459118. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:14,203][00194] Avg episode reward: [(0, '28.261')] [2024-09-01 17:27:19,198][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7852032. Throughput: 0: 219.5. Samples: 459992. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:19,208][00194] Avg episode reward: [(0, '28.091')] [2024-09-01 17:27:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7856128. Throughput: 0: 228.7. Samples: 461626. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:27:24,209][00194] Avg episode reward: [(0, '28.089')] [2024-09-01 17:27:26,755][37549] Updated weights for policy 0, policy_version 1919 (0.3137) [2024-09-01 17:27:29,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7860224. Throughput: 0: 213.1. Samples: 462350. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:27:29,206][00194] Avg episode reward: [(0, '26.991')] [2024-09-01 17:27:30,762][37536] Signal inference workers to stop experience collection... (450 times) [2024-09-01 17:27:30,851][37549] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-09-01 17:27:31,673][37536] Signal inference workers to resume experience collection... (450 times) [2024-09-01 17:27:31,674][37549] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-09-01 17:27:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7864320. Throughput: 0: 222.4. Samples: 463416. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:34,207][00194] Avg episode reward: [(0, '26.615')] [2024-09-01 17:27:39,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7868416. Throughput: 0: 223.6. Samples: 465018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:39,206][00194] Avg episode reward: [(0, '27.236')] [2024-09-01 17:27:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7872512. Throughput: 0: 230.7. Samples: 466444. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:44,203][00194] Avg episode reward: [(0, '26.962')] [2024-09-01 17:27:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 7876608. Throughput: 0: 231.6. Samples: 467326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:49,200][00194] Avg episode reward: [(0, '27.246')] [2024-09-01 17:27:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7880704. Throughput: 0: 223.0. Samples: 468318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:54,207][00194] Avg episode reward: [(0, '27.603')] [2024-09-01 17:27:59,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7888896. Throughput: 0: 233.3. Samples: 469618. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:27:59,201][00194] Avg episode reward: [(0, '27.821')] [2024-09-01 17:28:04,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7892992. Throughput: 0: 234.1. Samples: 470526. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:28:04,201][00194] Avg episode reward: [(0, '27.715')] [2024-09-01 17:28:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7897088. Throughput: 0: 223.1. Samples: 471664. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:28:09,205][00194] Avg episode reward: [(0, '28.233')] [2024-09-01 17:28:13,359][37549] Updated weights for policy 0, policy_version 1929 (0.2122) [2024-09-01 17:28:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7901184. Throughput: 0: 235.3. Samples: 472940. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:28:14,202][00194] Avg episode reward: [(0, '28.450')] [2024-09-01 17:28:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7905280. Throughput: 0: 231.4. Samples: 473830. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:28:19,202][00194] Avg episode reward: [(0, '27.097')] [2024-09-01 17:28:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7909376. Throughput: 0: 232.7. Samples: 475488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:28:24,201][00194] Avg episode reward: [(0, '27.287')] [2024-09-01 17:28:29,203][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7913472. Throughput: 0: 224.1. Samples: 476530. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:28:29,206][00194] Avg episode reward: [(0, '27.070')] [2024-09-01 17:28:34,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7917568. Throughput: 0: 222.0. Samples: 477314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:28:34,207][00194] Avg episode reward: [(0, '25.843')] [2024-09-01 17:28:39,198][00194] Fps is (10 sec: 1229.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7925760. Throughput: 0: 233.8. Samples: 478840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:28:39,201][00194] Avg episode reward: [(0, '26.080')] [2024-09-01 17:28:44,199][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7929856. Throughput: 0: 217.8. Samples: 479420. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:28:44,202][00194] Avg episode reward: [(0, '26.371')] [2024-09-01 17:28:49,198][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 7933952. Throughput: 0: 225.3. Samples: 480666. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:28:49,200][00194] Avg episode reward: [(0, '26.704')] [2024-09-01 17:28:54,198][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7938048. Throughput: 0: 230.4. Samples: 482034. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:28:54,203][00194] Avg episode reward: [(0, '26.370')] [2024-09-01 17:28:56,667][37549] Updated weights for policy 0, policy_version 1939 (0.1033) [2024-09-01 17:28:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7942144. Throughput: 0: 222.4. Samples: 482946. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:28:59,204][00194] Avg episode reward: [(0, '26.734')] [2024-09-01 17:29:04,199][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7946240. Throughput: 0: 226.8. Samples: 484038. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:04,203][00194] Avg episode reward: [(0, '26.819')] [2024-09-01 17:29:07,056][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001941_7950336.pth... [2024-09-01 17:29:07,167][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001888_7733248.pth [2024-09-01 17:29:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7950336. Throughput: 0: 222.1. Samples: 485482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:09,201][00194] Avg episode reward: [(0, '27.216')] [2024-09-01 17:29:14,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7954432. Throughput: 0: 234.2. Samples: 487068. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:14,201][00194] Avg episode reward: [(0, '27.902')] [2024-09-01 17:29:19,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7958528. Throughput: 0: 232.5. Samples: 487778. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:19,201][00194] Avg episode reward: [(0, '27.801')] [2024-09-01 17:29:24,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7962624. Throughput: 0: 222.4. Samples: 488846. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:24,208][00194] Avg episode reward: [(0, '27.869')] [2024-09-01 17:29:29,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 7970816. Throughput: 0: 224.5. Samples: 489524. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:29,206][00194] Avg episode reward: [(0, '28.050')] [2024-09-01 17:29:34,198][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 7974912. Throughput: 0: 232.8. Samples: 491142. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:34,208][00194] Avg episode reward: [(0, '29.519')] [2024-09-01 17:29:39,200][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 7979008. Throughput: 0: 226.1. Samples: 492208. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:29:39,203][00194] Avg episode reward: [(0, '29.740')] [2024-09-01 17:29:43,763][37549] Updated weights for policy 0, policy_version 1949 (0.1999) [2024-09-01 17:29:44,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7983104. Throughput: 0: 231.6. Samples: 493368. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:29:44,204][00194] Avg episode reward: [(0, '29.516')] [2024-09-01 17:29:49,198][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7987200. Throughput: 0: 226.2. Samples: 494216. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:49,201][00194] Avg episode reward: [(0, '30.511')] [2024-09-01 17:29:54,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7991296. Throughput: 0: 229.4. Samples: 495804. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:54,203][00194] Avg episode reward: [(0, '31.283')] [2024-09-01 17:29:59,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7995392. Throughput: 0: 204.3. Samples: 496262. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:29:59,202][00194] Avg episode reward: [(0, '30.685')] [2024-09-01 17:30:04,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 7999488. Throughput: 0: 212.2. Samples: 497326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:30:04,201][00194] Avg episode reward: [(0, '30.852')] [2024-09-01 17:30:09,198][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 8003584. Throughput: 0: 230.5. Samples: 499218. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:30:09,209][00194] Avg episode reward: [(0, '31.073')] [2024-09-01 17:30:09,369][37536] Stopping Batcher_0... [2024-09-01 17:30:09,371][37536] Loop batcher_evt_loop terminating... [2024-09-01 17:30:09,373][00194] Component Batcher_0 stopped! [2024-09-01 17:30:09,432][37549] Weights refcount: 2 0 [2024-09-01 17:30:09,435][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 17:30:09,445][37549] Stopping InferenceWorker_p0-w0... [2024-09-01 17:30:09,445][37549] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 17:30:09,883][37555] Stopping RolloutWorker_w0... [2024-09-01 17:30:09,883][00194] Component RolloutWorker_w0 stopped! [2024-09-01 17:30:09,913][37555] Loop rollout_proc0_evt_loop terminating... [2024-09-01 17:30:09,940][00194] Component RolloutWorker_w2 stopped! [2024-09-01 17:30:09,941][37556] Stopping RolloutWorker_w2... [2024-09-01 17:30:09,964][37556] Loop rollout_proc2_evt_loop terminating... [2024-09-01 17:30:09,981][00194] Component RolloutWorker_w6 stopped! [2024-09-01 17:30:09,982][37560] Stopping RolloutWorker_w6... [2024-09-01 17:30:09,998][37560] Loop rollout_proc6_evt_loop terminating... [2024-09-01 17:30:10,023][37558] Stopping RolloutWorker_w4... [2024-09-01 17:30:10,023][00194] Component RolloutWorker_w4 stopped! [2024-09-01 17:30:10,054][00194] Component RolloutWorker_w3 stopped! [2024-09-01 17:30:10,054][37557] Stopping RolloutWorker_w3... [2024-09-01 17:30:10,070][37557] Loop rollout_proc3_evt_loop terminating... [2024-09-01 17:30:10,024][37558] Loop rollout_proc4_evt_loop terminating... [2024-09-01 17:30:10,081][00194] Component RolloutWorker_w7 stopped! [2024-09-01 17:30:10,085][37561] Stopping RolloutWorker_w7... [2024-09-01 17:30:10,086][37561] Loop rollout_proc7_evt_loop terminating... [2024-09-01 17:30:10,104][00194] Component RolloutWorker_w1 stopped! [2024-09-01 17:30:10,115][00194] Component RolloutWorker_w5 stopped! [2024-09-01 17:30:10,114][37559] Stopping RolloutWorker_w5... [2024-09-01 17:30:10,102][37551] Stopping RolloutWorker_w1... [2024-09-01 17:30:10,125][37551] Loop rollout_proc1_evt_loop terminating... [2024-09-01 17:30:10,138][37559] Loop rollout_proc5_evt_loop terminating... [2024-09-01 17:30:15,456][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2024-09-01 17:30:15,567][37536] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001915_7843840.pth [2024-09-01 17:30:15,587][37536] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2024-09-01 17:30:15,789][37536] Stopping LearnerWorker_p0... [2024-09-01 17:30:15,789][37536] Loop learner_proc0_evt_loop terminating... [2024-09-01 17:30:15,793][00194] Component LearnerWorker_p0 stopped! [2024-09-01 17:30:15,800][00194] Waiting for process learner_proc0 to stop... [2024-09-01 17:30:16,771][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 17:30:16,778][00194] Waiting for process rollout_proc0 to join... [2024-09-01 17:30:17,233][00194] Waiting for process rollout_proc1 to join... [2024-09-01 17:30:17,238][00194] Waiting for process rollout_proc2 to join... [2024-09-01 17:30:17,244][00194] Waiting for process rollout_proc3 to join... [2024-09-01 17:30:17,252][00194] Waiting for process rollout_proc4 to join... [2024-09-01 17:30:17,261][00194] Waiting for process rollout_proc5 to join... [2024-09-01 17:30:17,271][00194] Waiting for process rollout_proc6 to join... [2024-09-01 17:30:17,275][00194] Waiting for process rollout_proc7 to join... [2024-09-01 17:30:17,282][00194] Batcher 0 profile tree view: batching: 9.9437, releasing_batches: 0.1301 [2024-09-01 17:30:17,286][00194] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 31.8655 update_model: 95.9111 weight_update: 0.3019 one_step: 0.0289 handle_policy_step: 1438.1336 deserialize: 47.0829, stack: 7.5201, obs_to_device_normalize: 243.6626, forward: 1051.9555, send_messages: 32.9708 prepare_outputs: 17.3771 to_cpu: 1.7869 [2024-09-01 17:30:17,289][00194] Learner 0 profile tree view: misc: 0.0103, prepare_batch: 595.4877 train: 1591.5568 epoch_init: 0.0037, minibatch_init: 0.0052, losses_postprocess: 0.0927, kl_divergence: 0.3651, after_optimizer: 1.2924 calculate_losses: 770.7219 losses_init: 0.0021, forward_head: 682.9694, bptt_initial: 2.0287, tail: 1.8447, advantages_returns: 0.1277, losses: 0.8201 bptt: 82.6316 bptt_forward_core: 82.1652 update: 818.7317 clip: 1.8504 [2024-09-01 17:30:17,293][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5193, enqueue_policy_requests: 28.1973, env_step: 832.4050, overhead: 19.9107, complete_rollouts: 9.1026 save_policy_outputs: 22.0486 split_output_tensors: 7.6534 [2024-09-01 17:30:17,294][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3722, enqueue_policy_requests: 27.6687, env_step: 829.5685, overhead: 20.8007, complete_rollouts: 6.8408 save_policy_outputs: 22.4192 split_output_tensors: 7.1777 [2024-09-01 17:30:17,296][00194] Loop Runner_EvtLoop terminating... [2024-09-01 17:30:17,300][00194] Runner profile tree view: main_loop: 2230.9332 [2024-09-01 17:30:17,304][00194] Collected {0: 8011776}, FPS: 894.1 [2024-09-01 17:30:17,355][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 17:30:17,357][00194] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-01 17:30:17,360][00194] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-01 17:30:17,362][00194] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-01 17:30:17,363][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 17:30:17,364][00194] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-01 17:30:17,365][00194] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-01 17:30:17,366][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-01 17:30:17,368][00194] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-01 17:30:17,369][00194] Adding new argument 'hf_repository'='jarski/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-01 17:30:17,370][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-01 17:30:17,371][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-01 17:30:17,372][00194] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-01 17:30:17,374][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-01 17:30:17,375][00194] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-01 17:30:17,403][00194] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 17:30:17,407][00194] RunningMeanStd input shape: (1,) [2024-09-01 17:30:17,429][00194] ConvEncoder: input_channels=3 [2024-09-01 17:30:17,486][00194] Conv encoder output size: 512 [2024-09-01 17:30:17,488][00194] Policy head output size: 512 [2024-09-01 17:30:17,510][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2024-09-01 17:30:18,088][00194] Num frames 100... [2024-09-01 17:30:18,333][00194] Num frames 200... [2024-09-01 17:30:18,603][00194] Num frames 300... [2024-09-01 17:30:18,822][00194] Num frames 400... [2024-09-01 17:30:19,060][00194] Num frames 500... [2024-09-01 17:30:19,277][00194] Num frames 600... [2024-09-01 17:30:19,491][00194] Num frames 700... [2024-09-01 17:30:19,706][00194] Num frames 800... [2024-09-01 17:30:19,938][00194] Num frames 900... [2024-09-01 17:30:20,149][00194] Num frames 1000... [2024-09-01 17:30:20,350][00194] Num frames 1100... [2024-09-01 17:30:20,560][00194] Num frames 1200... [2024-09-01 17:30:20,780][00194] Num frames 1300... [2024-09-01 17:30:21,000][00194] Num frames 1400... [2024-09-01 17:30:21,222][00194] Num frames 1500... [2024-09-01 17:30:21,425][00194] Num frames 1600... [2024-09-01 17:30:21,640][00194] Num frames 1700... [2024-09-01 17:30:21,859][00194] Num frames 1800... [2024-09-01 17:30:22,085][00194] Num frames 1900... [2024-09-01 17:30:22,291][00194] Num frames 2000... [2024-09-01 17:30:22,503][00194] Num frames 2100... [2024-09-01 17:30:22,556][00194] Avg episode rewards: #0: 54.999, true rewards: #0: 21.000 [2024-09-01 17:30:22,558][00194] Avg episode reward: 54.999, avg true_objective: 21.000 [2024-09-01 17:30:22,781][00194] Num frames 2200... [2024-09-01 17:30:23,009][00194] Num frames 2300... [2024-09-01 17:30:23,232][00194] Num frames 2400... [2024-09-01 17:30:23,445][00194] Num frames 2500... [2024-09-01 17:30:23,672][00194] Num frames 2600... [2024-09-01 17:30:23,824][00194] Avg episode rewards: #0: 33.719, true rewards: #0: 13.220 [2024-09-01 17:30:23,825][00194] Avg episode reward: 33.719, avg true_objective: 13.220 [2024-09-01 17:30:23,947][00194] Num frames 2700... [2024-09-01 17:30:24,162][00194] Num frames 2800... [2024-09-01 17:30:24,370][00194] Num frames 2900... [2024-09-01 17:30:24,589][00194] Num frames 3000... [2024-09-01 17:30:24,807][00194] Num frames 3100... [2024-09-01 17:30:25,024][00194] Num frames 3200... [2024-09-01 17:30:25,242][00194] Num frames 3300... [2024-09-01 17:30:25,452][00194] Num frames 3400... [2024-09-01 17:30:25,671][00194] Num frames 3500... [2024-09-01 17:30:25,896][00194] Num frames 3600... [2024-09-01 17:30:26,129][00194] Num frames 3700... [2024-09-01 17:30:26,443][00194] Num frames 3800... [2024-09-01 17:30:26,747][00194] Num frames 3900... [2024-09-01 17:30:27,043][00194] Num frames 4000... [2024-09-01 17:30:27,328][00194] Num frames 4100... [2024-09-01 17:30:27,599][00194] Num frames 4200... [2024-09-01 17:30:27,907][00194] Num frames 4300... [2024-09-01 17:30:28,213][00194] Num frames 4400... [2024-09-01 17:30:28,519][00194] Num frames 4500... [2024-09-01 17:30:28,764][00194] Avg episode rewards: #0: 39.213, true rewards: #0: 15.213 [2024-09-01 17:30:28,767][00194] Avg episode reward: 39.213, avg true_objective: 15.213 [2024-09-01 17:30:28,895][00194] Num frames 4600... [2024-09-01 17:30:29,219][00194] Num frames 4700... [2024-09-01 17:30:29,450][00194] Num frames 4800... [2024-09-01 17:30:29,666][00194] Num frames 4900... [2024-09-01 17:30:29,910][00194] Avg episode rewards: #0: 31.699, true rewards: #0: 12.450 [2024-09-01 17:30:29,912][00194] Avg episode reward: 31.699, avg true_objective: 12.450 [2024-09-01 17:30:29,967][00194] Num frames 5000... [2024-09-01 17:30:30,209][00194] Num frames 5100... [2024-09-01 17:30:30,435][00194] Num frames 5200... [2024-09-01 17:30:30,649][00194] Num frames 5300... [2024-09-01 17:30:30,859][00194] Num frames 5400... [2024-09-01 17:30:31,086][00194] Num frames 5500... [2024-09-01 17:30:31,302][00194] Num frames 5600... [2024-09-01 17:30:31,517][00194] Num frames 5700... [2024-09-01 17:30:31,741][00194] Avg episode rewards: #0: 29.360, true rewards: #0: 11.560 [2024-09-01 17:30:31,743][00194] Avg episode reward: 29.360, avg true_objective: 11.560 [2024-09-01 17:30:31,788][00194] Num frames 5800... [2024-09-01 17:30:32,017][00194] Num frames 5900... [2024-09-01 17:30:32,219][00194] Num frames 6000... [2024-09-01 17:30:32,430][00194] Num frames 6100... [2024-09-01 17:30:32,632][00194] Num frames 6200... [2024-09-01 17:30:32,851][00194] Num frames 6300... [2024-09-01 17:30:33,088][00194] Num frames 6400... [2024-09-01 17:30:33,306][00194] Num frames 6500... [2024-09-01 17:30:33,528][00194] Num frames 6600... [2024-09-01 17:30:33,746][00194] Num frames 6700... [2024-09-01 17:30:33,964][00194] Num frames 6800... [2024-09-01 17:30:34,196][00194] Num frames 6900... [2024-09-01 17:30:34,410][00194] Num frames 7000... [2024-09-01 17:30:34,622][00194] Num frames 7100... [2024-09-01 17:30:34,827][00194] Num frames 7200... [2024-09-01 17:30:35,057][00194] Num frames 7300... [2024-09-01 17:30:35,282][00194] Avg episode rewards: #0: 31.633, true rewards: #0: 12.300 [2024-09-01 17:30:35,284][00194] Avg episode reward: 31.633, avg true_objective: 12.300 [2024-09-01 17:30:35,331][00194] Num frames 7400... [2024-09-01 17:30:35,543][00194] Num frames 7500... [2024-09-01 17:30:35,755][00194] Num frames 7600... [2024-09-01 17:30:35,979][00194] Num frames 7700... [2024-09-01 17:30:36,211][00194] Num frames 7800... [2024-09-01 17:30:36,439][00194] Num frames 7900... [2024-09-01 17:30:36,653][00194] Num frames 8000... [2024-09-01 17:30:36,867][00194] Num frames 8100... [2024-09-01 17:30:37,116][00194] Num frames 8200... [2024-09-01 17:30:37,334][00194] Num frames 8300... [2024-09-01 17:30:37,553][00194] Num frames 8400... [2024-09-01 17:30:37,771][00194] Num frames 8500... [2024-09-01 17:30:38,008][00194] Num frames 8600... [2024-09-01 17:30:38,242][00194] Num frames 8700... [2024-09-01 17:30:38,466][00194] Num frames 8800... [2024-09-01 17:30:38,687][00194] Num frames 8900... [2024-09-01 17:30:38,909][00194] Num frames 9000... [2024-09-01 17:30:39,126][00194] Num frames 9100... [2024-09-01 17:30:39,385][00194] Num frames 9200... [2024-09-01 17:30:39,686][00194] Num frames 9300... [2024-09-01 17:30:39,979][00194] Num frames 9400... [2024-09-01 17:30:40,270][00194] Avg episode rewards: #0: 36.114, true rewards: #0: 13.543 [2024-09-01 17:30:40,276][00194] Avg episode reward: 36.114, avg true_objective: 13.543 [2024-09-01 17:30:40,343][00194] Num frames 9500... [2024-09-01 17:30:40,624][00194] Num frames 9600... [2024-09-01 17:30:40,925][00194] Num frames 9700... [2024-09-01 17:30:41,227][00194] Num frames 9800... [2024-09-01 17:30:41,559][00194] Num frames 9900... [2024-09-01 17:30:41,855][00194] Num frames 10000... [2024-09-01 17:30:42,163][00194] Num frames 10100... [2024-09-01 17:30:42,470][00194] Num frames 10200... [2024-09-01 17:30:42,701][00194] Num frames 10300... [2024-09-01 17:30:42,922][00194] Num frames 10400... [2024-09-01 17:30:43,152][00194] Num frames 10500... [2024-09-01 17:30:43,380][00194] Num frames 10600... [2024-09-01 17:30:43,594][00194] Num frames 10700... [2024-09-01 17:30:43,682][00194] Avg episode rewards: #0: 35.141, true rewards: #0: 13.391 [2024-09-01 17:30:43,684][00194] Avg episode reward: 35.141, avg true_objective: 13.391 [2024-09-01 17:30:43,862][00194] Num frames 10800... [2024-09-01 17:30:44,094][00194] Num frames 10900... [2024-09-01 17:30:44,309][00194] Num frames 11000... [2024-09-01 17:30:44,531][00194] Num frames 11100... [2024-09-01 17:30:44,750][00194] Num frames 11200... [2024-09-01 17:30:44,969][00194] Num frames 11300... [2024-09-01 17:30:45,191][00194] Num frames 11400... [2024-09-01 17:30:45,411][00194] Num frames 11500... [2024-09-01 17:30:45,632][00194] Num frames 11600... [2024-09-01 17:30:45,851][00194] Num frames 11700... [2024-09-01 17:30:46,086][00194] Num frames 11800... [2024-09-01 17:30:46,307][00194] Num frames 11900... [2024-09-01 17:30:46,547][00194] Num frames 12000... [2024-09-01 17:30:46,775][00194] Num frames 12100... [2024-09-01 17:30:47,007][00194] Num frames 12200... [2024-09-01 17:30:47,225][00194] Num frames 12300... [2024-09-01 17:30:47,440][00194] Num frames 12400... [2024-09-01 17:30:47,669][00194] Num frames 12500... [2024-09-01 17:30:47,891][00194] Num frames 12600... [2024-09-01 17:30:48,115][00194] Num frames 12700... [2024-09-01 17:30:48,334][00194] Num frames 12800... [2024-09-01 17:30:48,425][00194] Avg episode rewards: #0: 37.125, true rewards: #0: 14.237 [2024-09-01 17:30:48,428][00194] Avg episode reward: 37.125, avg true_objective: 14.237 [2024-09-01 17:30:48,632][00194] Num frames 12900... [2024-09-01 17:30:48,854][00194] Num frames 13000... [2024-09-01 17:30:49,097][00194] Num frames 13100... [2024-09-01 17:30:49,327][00194] Num frames 13200... [2024-09-01 17:30:49,548][00194] Num frames 13300... [2024-09-01 17:30:49,770][00194] Num frames 13400... [2024-09-01 17:30:49,991][00194] Num frames 13500... [2024-09-01 17:30:50,212][00194] Num frames 13600... [2024-09-01 17:30:50,422][00194] Num frames 13700... [2024-09-01 17:30:50,648][00194] Num frames 13800... [2024-09-01 17:30:50,858][00194] Num frames 13900... [2024-09-01 17:30:51,095][00194] Num frames 14000... [2024-09-01 17:30:51,310][00194] Num frames 14100... [2024-09-01 17:30:51,531][00194] Num frames 14200... [2024-09-01 17:30:51,755][00194] Num frames 14300... [2024-09-01 17:30:51,973][00194] Num frames 14400... [2024-09-01 17:30:52,193][00194] Num frames 14500... [2024-09-01 17:30:52,275][00194] Avg episode rewards: #0: 37.812, true rewards: #0: 14.512 [2024-09-01 17:30:52,278][00194] Avg episode reward: 37.812, avg true_objective: 14.512 [2024-09-01 17:32:32,823][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-01 17:32:35,863][00194] The model has been pushed to https://huggingface.co/jarski/rl_course_vizdoom_health_gathering_supreme [2024-09-01 17:34:43,971][00194] Environment doom_basic already registered, overwriting... [2024-09-01 17:34:43,975][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 17:34:43,979][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 17:34:43,981][00194] Environment doom_dm already registered, overwriting... [2024-09-01 17:34:43,983][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 17:34:43,985][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 17:34:43,987][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 17:34:43,989][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 17:34:43,991][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 17:34:43,994][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 17:34:43,996][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 17:34:43,998][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 17:34:44,000][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 17:34:44,001][00194] Environment doom_battle already registered, overwriting... [2024-09-01 17:34:44,003][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 17:34:44,005][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 17:34:44,008][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 17:34:44,010][00194] Environment doom_duel already registered, overwriting... [2024-09-01 17:34:44,012][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 17:34:44,014][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 17:34:44,016][00194] register_encoder_factory: [2024-09-01 17:34:44,036][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 17:34:44,037][00194] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2024-09-01 17:34:44,053][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 17:34:44,063][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 17:34:44,064][00194] Weights and Biases integration disabled [2024-09-01 17:34:44,073][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 17:34:46,068][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 17:34:46,072][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 17:34:46,078][00194] Rollout worker 0 uses device cpu [2024-09-01 17:34:46,080][00194] Rollout worker 1 uses device cpu [2024-09-01 17:34:46,083][00194] Rollout worker 2 uses device cpu [2024-09-01 17:34:46,087][00194] Rollout worker 3 uses device cpu [2024-09-01 17:34:46,089][00194] Rollout worker 4 uses device cpu [2024-09-01 17:34:46,090][00194] Rollout worker 5 uses device cpu [2024-09-01 17:34:46,092][00194] Rollout worker 6 uses device cpu [2024-09-01 17:34:46,093][00194] Rollout worker 7 uses device cpu [2024-09-01 17:34:46,238][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 17:34:46,277][00194] Starting all processes... [2024-09-01 17:34:46,278][00194] Starting process learner_proc0 [2024-09-01 17:34:46,332][00194] Starting all processes... [2024-09-01 17:34:46,340][00194] Starting process inference_proc0-0 [2024-09-01 17:34:46,343][00194] Starting process rollout_proc0 [2024-09-01 17:34:46,343][00194] Starting process rollout_proc1 [2024-09-01 17:34:46,343][00194] Starting process rollout_proc2 [2024-09-01 17:34:46,344][00194] Starting process rollout_proc3 [2024-09-01 17:34:46,349][00194] Starting process rollout_proc4 [2024-09-01 17:34:46,349][00194] Starting process rollout_proc5 [2024-09-01 17:34:46,349][00194] Starting process rollout_proc6 [2024-09-01 17:34:46,356][00194] Starting process rollout_proc7 [2024-09-01 17:35:00,678][47745] Worker 4 uses CPU cores [0] [2024-09-01 17:35:01,036][47749] Worker 7 uses CPU cores [1] [2024-09-01 17:35:01,112][47728] Starting seed is not provided [2024-09-01 17:35:01,114][47728] Initializing actor-critic model on device cpu [2024-09-01 17:35:01,115][47728] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 17:35:01,118][47728] RunningMeanStd input shape: (1,) [2024-09-01 17:35:01,190][47728] ConvEncoder: input_channels=3 [2024-09-01 17:35:01,267][47744] Worker 2 uses CPU cores [0] [2024-09-01 17:35:01,290][47742] Worker 0 uses CPU cores [0] [2024-09-01 17:35:01,378][47748] Worker 6 uses CPU cores [0] [2024-09-01 17:35:01,415][47743] Worker 1 uses CPU cores [1] [2024-09-01 17:35:01,568][47746] Worker 3 uses CPU cores [1] [2024-09-01 17:35:01,595][47747] Worker 5 uses CPU cores [1] [2024-09-01 17:35:01,663][47728] Conv encoder output size: 512 [2024-09-01 17:35:01,663][47728] Policy head output size: 512 [2024-09-01 17:35:01,689][47728] Created Actor Critic model with architecture: [2024-09-01 17:35:01,690][47728] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 17:35:02,269][47728] Using optimizer [2024-09-01 17:35:02,271][47728] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth... [2024-09-01 17:35:02,317][47728] Loading model from checkpoint [2024-09-01 17:35:02,342][47728] Loaded experiment state at self.train_step=1956, self.env_steps=8011776 [2024-09-01 17:35:02,343][47728] Initialized policy 0 weights for model version 1956 [2024-09-01 17:35:02,349][47741] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 17:35:02,351][47728] LearnerWorker_p0 finished initialization! [2024-09-01 17:35:02,352][47741] RunningMeanStd input shape: (1,) [2024-09-01 17:35:02,375][47741] ConvEncoder: input_channels=3 [2024-09-01 17:35:02,542][47741] Conv encoder output size: 512 [2024-09-01 17:35:02,544][47741] Policy head output size: 512 [2024-09-01 17:35:02,569][00194] Inference worker 0-0 is ready! [2024-09-01 17:35:02,571][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 17:35:02,682][47749] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,711][47743] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,717][47746] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,713][47747] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,717][47748] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,733][47745] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,745][47744] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:02,748][47742] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 17:35:04,073][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8011776. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 17:35:04,181][47749] Decorrelating experience for 0 frames... [2024-09-01 17:35:04,198][47747] Decorrelating experience for 0 frames... [2024-09-01 17:35:04,196][47744] Decorrelating experience for 0 frames... [2024-09-01 17:35:04,211][47746] Decorrelating experience for 0 frames... [2024-09-01 17:35:04,209][47745] Decorrelating experience for 0 frames... [2024-09-01 17:35:04,215][47742] Decorrelating experience for 0 frames... [2024-09-01 17:35:05,153][47745] Decorrelating experience for 32 frames... [2024-09-01 17:35:05,156][47744] Decorrelating experience for 32 frames... [2024-09-01 17:35:05,490][47749] Decorrelating experience for 32 frames... [2024-09-01 17:35:05,542][47746] Decorrelating experience for 32 frames... [2024-09-01 17:35:05,625][47747] Decorrelating experience for 32 frames... [2024-09-01 17:35:06,217][00194] Heartbeat connected on Batcher_0 [2024-09-01 17:35:06,230][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 17:35:06,261][47749] Decorrelating experience for 64 frames... [2024-09-01 17:35:06,286][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 17:35:06,751][47742] Decorrelating experience for 32 frames... [2024-09-01 17:35:06,972][47745] Decorrelating experience for 64 frames... [2024-09-01 17:35:06,990][47744] Decorrelating experience for 64 frames... [2024-09-01 17:35:07,222][47748] Decorrelating experience for 0 frames... [2024-09-01 17:35:07,637][47742] Decorrelating experience for 64 frames... [2024-09-01 17:35:07,752][47747] Decorrelating experience for 64 frames... [2024-09-01 17:35:08,502][47749] Decorrelating experience for 96 frames... [2024-09-01 17:35:08,563][47746] Decorrelating experience for 64 frames... [2024-09-01 17:35:08,891][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 17:35:08,953][47743] Decorrelating experience for 0 frames... [2024-09-01 17:35:09,078][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8011776. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 17:35:09,939][47742] Decorrelating experience for 96 frames... [2024-09-01 17:35:10,501][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 17:35:11,259][47745] Decorrelating experience for 96 frames... [2024-09-01 17:35:12,147][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 17:35:12,263][47746] Decorrelating experience for 96 frames... [2024-09-01 17:35:12,468][47743] Decorrelating experience for 32 frames... [2024-09-01 17:35:13,205][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 17:35:14,074][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8011776. Throughput: 0: 36.2. Samples: 362. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 17:35:14,080][00194] Avg episode reward: [(0, '0.435')] [2024-09-01 17:35:14,145][47747] Decorrelating experience for 96 frames... [2024-09-01 17:35:15,130][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 17:35:16,849][47748] Decorrelating experience for 32 frames... [2024-09-01 17:35:17,911][47743] Decorrelating experience for 64 frames... [2024-09-01 17:35:19,073][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8011776. Throughput: 0: 102.4. Samples: 1536. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 17:35:19,081][00194] Avg episode reward: [(0, '3.744')] [2024-09-01 17:35:20,858][47728] Signal inference workers to stop experience collection... [2024-09-01 17:35:20,896][47741] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 17:35:20,988][47744] Decorrelating experience for 96 frames... [2024-09-01 17:35:21,148][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 17:35:21,180][47748] Decorrelating experience for 64 frames... [2024-09-01 17:35:21,284][47743] Decorrelating experience for 96 frames... [2024-09-01 17:35:21,385][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 17:35:21,850][47748] Decorrelating experience for 96 frames... [2024-09-01 17:35:21,959][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 17:35:22,289][47728] Signal inference workers to resume experience collection... [2024-09-01 17:35:22,289][47741] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 17:35:24,073][00194] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 8015872. Throughput: 0: 137.8. Samples: 2756. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 17:35:24,076][00194] Avg episode reward: [(0, '4.928')] [2024-09-01 17:35:29,092][00194] Fps is (10 sec: 817.7, 60 sec: 327.4, 300 sec: 327.4). Total num frames: 8019968. Throughput: 0: 142.9. Samples: 3576. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 17:35:29,097][00194] Avg episode reward: [(0, '6.433')] [2024-09-01 17:35:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 8024064. Throughput: 0: 158.4. Samples: 4752. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:35:34,076][00194] Avg episode reward: [(0, '6.413')] [2024-09-01 17:35:39,074][00194] Fps is (10 sec: 820.7, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 8028160. Throughput: 0: 171.1. Samples: 5988. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:35:39,090][00194] Avg episode reward: [(0, '7.944')] [2024-09-01 17:35:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 8032256. Throughput: 0: 166.9. Samples: 6678. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:35:44,077][00194] Avg episode reward: [(0, '8.864')] [2024-09-01 17:35:49,073][00194] Fps is (10 sec: 819.3, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8036352. Throughput: 0: 181.3. Samples: 8160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:35:49,077][00194] Avg episode reward: [(0, '9.813')] [2024-09-01 17:35:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 8040448. Throughput: 0: 206.5. Samples: 9290. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:35:54,075][00194] Avg episode reward: [(0, '11.407')] [2024-09-01 17:35:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 595.8, 300 sec: 595.8). Total num frames: 8044544. Throughput: 0: 215.6. Samples: 10064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:35:59,083][00194] Avg episode reward: [(0, '13.735')] [2024-09-01 17:36:03,866][47741] Updated weights for policy 0, policy_version 1966 (0.2506) [2024-09-01 17:36:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 8052736. Throughput: 0: 229.0. Samples: 11842. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:36:04,075][00194] Avg episode reward: [(0, '15.147')] [2024-09-01 17:36:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 751.0, 300 sec: 693.2). Total num frames: 8056832. Throughput: 0: 223.6. Samples: 12820. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:36:09,080][00194] Avg episode reward: [(0, '17.124')] [2024-09-01 17:36:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 8060928. Throughput: 0: 221.3. Samples: 13532. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:36:14,076][00194] Avg episode reward: [(0, '17.111')] [2024-09-01 17:36:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 710.0). Total num frames: 8065024. Throughput: 0: 225.3. Samples: 14890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:36:19,075][00194] Avg episode reward: [(0, '19.737')] [2024-09-01 17:36:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 716.8). Total num frames: 8069120. Throughput: 0: 238.0. Samples: 16696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:36:24,075][00194] Avg episode reward: [(0, '21.668')] [2024-09-01 17:36:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.7, 300 sec: 722.8). Total num frames: 8073216. Throughput: 0: 228.7. Samples: 16968. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:36:29,080][00194] Avg episode reward: [(0, '22.343')] [2024-09-01 17:36:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 8077312. Throughput: 0: 222.9. Samples: 18192. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:36:34,081][00194] Avg episode reward: [(0, '24.009')] [2024-09-01 17:36:39,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 776.1). Total num frames: 8085504. Throughput: 0: 237.8. Samples: 19992. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:36:39,076][00194] Avg episode reward: [(0, '24.846')] [2024-09-01 17:36:44,076][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 778.2). Total num frames: 8089600. Throughput: 0: 237.9. Samples: 20770. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:36:44,079][00194] Avg episode reward: [(0, '24.920')] [2024-09-01 17:36:49,073][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 741.2). Total num frames: 8089600. Throughput: 0: 223.4. Samples: 21896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:36:49,081][00194] Avg episode reward: [(0, '24.920')] [2024-09-01 17:36:49,460][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001976_8093696.pth... [2024-09-01 17:36:49,465][47741] Updated weights for policy 0, policy_version 1976 (0.1945) [2024-09-01 17:36:49,579][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001941_7950336.pth [2024-09-01 17:36:54,076][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 781.9). Total num frames: 8097792. Throughput: 0: 229.1. Samples: 23130. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:36:54,081][00194] Avg episode reward: [(0, '25.429')] [2024-09-01 17:36:59,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 783.6). Total num frames: 8101888. Throughput: 0: 234.5. Samples: 24086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:36:59,076][00194] Avg episode reward: [(0, '26.293')] [2024-09-01 17:37:04,079][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 785.0). Total num frames: 8105984. Throughput: 0: 232.2. Samples: 25340. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:37:04,086][00194] Avg episode reward: [(0, '25.988')] [2024-09-01 17:37:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 786.4). Total num frames: 8110080. Throughput: 0: 218.6. Samples: 26532. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:37:09,078][00194] Avg episode reward: [(0, '26.540')] [2024-09-01 17:37:14,073][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 787.7). Total num frames: 8114176. Throughput: 0: 232.6. Samples: 27436. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:37:14,081][00194] Avg episode reward: [(0, '26.060')] [2024-09-01 17:37:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 788.9). Total num frames: 8118272. Throughput: 0: 246.3. Samples: 29276. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:37:19,081][00194] Avg episode reward: [(0, '26.599')] [2024-09-01 17:37:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 789.9). Total num frames: 8122368. Throughput: 0: 227.8. Samples: 30244. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:37:24,079][00194] Avg episode reward: [(0, '26.365')] [2024-09-01 17:37:29,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 8130560. Throughput: 0: 223.6. Samples: 30830. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:37:29,080][00194] Avg episode reward: [(0, '26.365')] [2024-09-01 17:37:32,238][47741] Updated weights for policy 0, policy_version 1986 (0.1956) [2024-09-01 17:37:34,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 8134656. Throughput: 0: 234.1. Samples: 32430. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:37:34,081][00194] Avg episode reward: [(0, '26.898')] [2024-09-01 17:37:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8138752. Throughput: 0: 236.5. Samples: 33774. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:37:39,079][00194] Avg episode reward: [(0, '27.384')] [2024-09-01 17:37:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8142848. Throughput: 0: 227.5. Samples: 34322. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:37:44,076][00194] Avg episode reward: [(0, '28.181')] [2024-09-01 17:37:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 8146944. Throughput: 0: 237.6. Samples: 36030. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:37:49,082][00194] Avg episode reward: [(0, '29.622')] [2024-09-01 17:37:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8151040. Throughput: 0: 242.1. Samples: 37426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:37:54,076][00194] Avg episode reward: [(0, '30.945')] [2024-09-01 17:37:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 8155136. Throughput: 0: 234.9. Samples: 38006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:37:59,076][00194] Avg episode reward: [(0, '31.694')] [2024-09-01 17:38:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 842.0). Total num frames: 8163328. Throughput: 0: 226.4. Samples: 39464. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:38:04,085][00194] Avg episode reward: [(0, '32.151')] [2024-09-01 17:38:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 841.3). Total num frames: 8167424. Throughput: 0: 237.5. Samples: 40932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:38:09,080][00194] Avg episode reward: [(0, '32.851')] [2024-09-01 17:38:11,855][47728] Saving new best policy, reward=32.851! [2024-09-01 17:38:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 840.8). Total num frames: 8171520. Throughput: 0: 236.7. Samples: 41480. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:14,079][00194] Avg episode reward: [(0, '32.978')] [2024-09-01 17:38:17,536][47728] Saving new best policy, reward=32.978! [2024-09-01 17:38:17,549][47741] Updated weights for policy 0, policy_version 1996 (0.1004) [2024-09-01 17:38:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 840.2). Total num frames: 8175616. Throughput: 0: 224.2. Samples: 42518. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:19,077][00194] Avg episode reward: [(0, '32.800')] [2024-09-01 17:38:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 839.7). Total num frames: 8179712. Throughput: 0: 236.0. Samples: 44394. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:24,082][00194] Avg episode reward: [(0, '31.144')] [2024-09-01 17:38:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 8183808. Throughput: 0: 236.1. Samples: 44946. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:29,076][00194] Avg episode reward: [(0, '30.769')] [2024-09-01 17:38:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.7). Total num frames: 8187904. Throughput: 0: 230.0. Samples: 46380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:34,076][00194] Avg episode reward: [(0, '30.520')] [2024-09-01 17:38:39,081][00194] Fps is (10 sec: 1227.8, 60 sec: 955.6, 300 sec: 857.3). Total num frames: 8196096. Throughput: 0: 226.1. Samples: 47602. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:39,086][00194] Avg episode reward: [(0, '29.990')] [2024-09-01 17:38:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 856.4). Total num frames: 8200192. Throughput: 0: 236.8. Samples: 48662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:44,076][00194] Avg episode reward: [(0, '29.219')] [2024-09-01 17:38:46,962][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002003_8204288.pth... [2024-09-01 17:38:47,084][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001956_8011776.pth [2024-09-01 17:38:49,073][00194] Fps is (10 sec: 819.8, 60 sec: 955.7, 300 sec: 855.6). Total num frames: 8204288. Throughput: 0: 230.5. Samples: 49836. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:49,081][00194] Avg episode reward: [(0, '28.558')] [2024-09-01 17:38:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 854.8). Total num frames: 8208384. Throughput: 0: 221.8. Samples: 50912. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:38:54,078][00194] Avg episode reward: [(0, '28.558')] [2024-09-01 17:38:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 854.1). Total num frames: 8212480. Throughput: 0: 227.5. Samples: 51718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:38:59,077][00194] Avg episode reward: [(0, '28.676')] [2024-09-01 17:39:00,708][47741] Updated weights for policy 0, policy_version 2006 (0.0984) [2024-09-01 17:39:02,980][47728] Signal inference workers to stop experience collection... (50 times) [2024-09-01 17:39:03,022][47741] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 17:39:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 853.3). Total num frames: 8216576. Throughput: 0: 245.3. Samples: 53556. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:04,078][00194] Avg episode reward: [(0, '28.719')] [2024-09-01 17:39:04,605][47728] Signal inference workers to resume experience collection... (50 times) [2024-09-01 17:39:04,605][47741] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 17:39:09,076][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 852.6). Total num frames: 8220672. Throughput: 0: 229.5. Samples: 54724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:09,082][00194] Avg episode reward: [(0, '28.698')] [2024-09-01 17:39:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 852.0). Total num frames: 8224768. Throughput: 0: 225.4. Samples: 55088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:14,076][00194] Avg episode reward: [(0, '29.010')] [2024-09-01 17:39:19,073][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 867.4). Total num frames: 8232960. Throughput: 0: 232.0. Samples: 56822. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:39:19,076][00194] Avg episode reward: [(0, '28.808')] [2024-09-01 17:39:24,079][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 866.4). Total num frames: 8237056. Throughput: 0: 235.2. Samples: 58184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:24,083][00194] Avg episode reward: [(0, '29.338')] [2024-09-01 17:39:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 865.6). Total num frames: 8241152. Throughput: 0: 226.0. Samples: 58832. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:29,076][00194] Avg episode reward: [(0, '29.945')] [2024-09-01 17:39:34,073][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 864.7). Total num frames: 8245248. Throughput: 0: 225.4. Samples: 59980. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:39:34,076][00194] Avg episode reward: [(0, '29.800')] [2024-09-01 17:39:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 863.9). Total num frames: 8249344. Throughput: 0: 241.8. Samples: 61794. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:39:39,076][00194] Avg episode reward: [(0, '28.606')] [2024-09-01 17:39:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 863.1). Total num frames: 8253440. Throughput: 0: 238.6. Samples: 62454. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:44,076][00194] Avg episode reward: [(0, '28.662')] [2024-09-01 17:39:45,294][47741] Updated weights for policy 0, policy_version 2016 (0.0524) [2024-09-01 17:39:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 862.3). Total num frames: 8257536. Throughput: 0: 221.2. Samples: 63512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:49,079][00194] Avg episode reward: [(0, '29.399')] [2024-09-01 17:39:54,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 875.7). Total num frames: 8265728. Throughput: 0: 228.1. Samples: 64990. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:39:54,082][00194] Avg episode reward: [(0, '28.389')] [2024-09-01 17:39:59,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 8269824. Throughput: 0: 241.8. Samples: 65970. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:39:59,076][00194] Avg episode reward: [(0, '28.925')] [2024-09-01 17:40:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 8273920. Throughput: 0: 226.4. Samples: 67010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:04,080][00194] Avg episode reward: [(0, '28.617')] [2024-09-01 17:40:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 902.5). Total num frames: 8278016. Throughput: 0: 224.3. Samples: 68276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:40:09,081][00194] Avg episode reward: [(0, '28.685')] [2024-09-01 17:40:14,074][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 8282112. Throughput: 0: 227.1. Samples: 69050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:40:14,077][00194] Avg episode reward: [(0, '28.195')] [2024-09-01 17:40:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8286208. Throughput: 0: 236.0. Samples: 70602. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:19,076][00194] Avg episode reward: [(0, '28.054')] [2024-09-01 17:40:24,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8290304. Throughput: 0: 222.0. Samples: 71786. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:24,076][00194] Avg episode reward: [(0, '27.097')] [2024-09-01 17:40:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8294400. Throughput: 0: 217.0. Samples: 72218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:29,083][00194] Avg episode reward: [(0, '27.508')] [2024-09-01 17:40:29,848][47741] Updated weights for policy 0, policy_version 2026 (0.2936) [2024-09-01 17:40:34,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8302592. Throughput: 0: 238.9. Samples: 74264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:34,079][00194] Avg episode reward: [(0, '27.500')] [2024-09-01 17:40:39,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8306688. Throughput: 0: 227.7. Samples: 75236. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:39,075][00194] Avg episode reward: [(0, '28.245')] [2024-09-01 17:40:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8310784. Throughput: 0: 221.9. Samples: 75956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:44,081][00194] Avg episode reward: [(0, '28.808')] [2024-09-01 17:40:47,596][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002030_8314880.pth... [2024-09-01 17:40:47,712][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001976_8093696.pth [2024-09-01 17:40:49,074][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8314880. Throughput: 0: 228.5. Samples: 77294. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:49,081][00194] Avg episode reward: [(0, '28.897')] [2024-09-01 17:40:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8318976. Throughput: 0: 236.5. Samples: 78918. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:54,079][00194] Avg episode reward: [(0, '29.124')] [2024-09-01 17:40:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8323072. Throughput: 0: 228.7. Samples: 79342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:40:59,079][00194] Avg episode reward: [(0, '29.053')] [2024-09-01 17:41:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8327168. Throughput: 0: 224.0. Samples: 80684. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:41:04,077][00194] Avg episode reward: [(0, '29.066')] [2024-09-01 17:41:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8331264. Throughput: 0: 236.7. Samples: 82438. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:41:09,080][00194] Avg episode reward: [(0, '29.593')] [2024-09-01 17:41:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8335360. Throughput: 0: 240.9. Samples: 83058. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:41:14,076][00194] Avg episode reward: [(0, '29.257')] [2024-09-01 17:41:14,433][47741] Updated weights for policy 0, policy_version 2036 (0.2112) [2024-09-01 17:41:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8339456. Throughput: 0: 221.8. Samples: 84246. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:41:19,078][00194] Avg episode reward: [(0, '29.091')] [2024-09-01 17:41:24,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8347648. Throughput: 0: 228.8. Samples: 85534. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:24,075][00194] Avg episode reward: [(0, '29.140')] [2024-09-01 17:41:29,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8351744. Throughput: 0: 235.5. Samples: 86552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:29,078][00194] Avg episode reward: [(0, '28.605')] [2024-09-01 17:41:34,078][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 8355840. Throughput: 0: 231.9. Samples: 87732. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:34,081][00194] Avg episode reward: [(0, '27.803')] [2024-09-01 17:41:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8359936. Throughput: 0: 221.4. Samples: 88882. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:39,081][00194] Avg episode reward: [(0, '27.708')] [2024-09-01 17:41:44,073][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8364032. Throughput: 0: 232.6. Samples: 89810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:44,080][00194] Avg episode reward: [(0, '27.955')] [2024-09-01 17:41:49,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8368128. Throughput: 0: 243.6. Samples: 91646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:49,077][00194] Avg episode reward: [(0, '28.333')] [2024-09-01 17:41:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8372224. Throughput: 0: 224.8. Samples: 92552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:54,076][00194] Avg episode reward: [(0, '28.403')] [2024-09-01 17:41:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8376320. Throughput: 0: 225.5. Samples: 93206. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:41:59,078][00194] Avg episode reward: [(0, '28.141')] [2024-09-01 17:41:59,104][47741] Updated weights for policy 0, policy_version 2046 (0.0538) [2024-09-01 17:42:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8384512. Throughput: 0: 233.2. Samples: 94740. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:42:04,077][00194] Avg episode reward: [(0, '27.790')] [2024-09-01 17:42:09,078][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8388608. Throughput: 0: 233.7. Samples: 96050. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:42:09,082][00194] Avg episode reward: [(0, '27.775')] [2024-09-01 17:42:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8392704. Throughput: 0: 226.2. Samples: 96732. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:42:14,076][00194] Avg episode reward: [(0, '27.054')] [2024-09-01 17:42:19,073][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8396800. Throughput: 0: 226.8. Samples: 97938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:42:19,076][00194] Avg episode reward: [(0, '27.615')] [2024-09-01 17:42:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8400896. Throughput: 0: 243.1. Samples: 99820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:42:24,076][00194] Avg episode reward: [(0, '28.009')] [2024-09-01 17:42:29,077][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 8404992. Throughput: 0: 229.7. Samples: 100146. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:42:29,086][00194] Avg episode reward: [(0, '27.523')] [2024-09-01 17:42:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8409088. Throughput: 0: 217.7. Samples: 101442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:42:34,082][00194] Avg episode reward: [(0, '27.837')] [2024-09-01 17:42:39,073][00194] Fps is (10 sec: 1229.3, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8417280. Throughput: 0: 233.2. Samples: 103044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:42:39,075][00194] Avg episode reward: [(0, '27.536')] [2024-09-01 17:42:42,661][47741] Updated weights for policy 0, policy_version 2056 (0.0997) [2024-09-01 17:42:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8421376. Throughput: 0: 237.2. Samples: 103882. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:42:44,077][00194] Avg episode reward: [(0, '27.613')] [2024-09-01 17:42:46,930][47728] Signal inference workers to stop experience collection... (100 times) [2024-09-01 17:42:47,001][47741] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 17:42:48,343][47728] Signal inference workers to resume experience collection... (100 times) [2024-09-01 17:42:48,345][47741] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 17:42:48,355][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002057_8425472.pth... [2024-09-01 17:42:48,470][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002003_8204288.pth [2024-09-01 17:42:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8425472. Throughput: 0: 225.7. Samples: 104896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:42:49,078][00194] Avg episode reward: [(0, '27.437')] [2024-09-01 17:42:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8429568. Throughput: 0: 233.2. Samples: 106542. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:42:54,077][00194] Avg episode reward: [(0, '27.518')] [2024-09-01 17:42:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 8433664. Throughput: 0: 234.2. Samples: 107272. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:42:59,084][00194] Avg episode reward: [(0, '27.548')] [2024-09-01 17:43:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8437760. Throughput: 0: 235.5. Samples: 108536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:43:04,079][00194] Avg episode reward: [(0, '27.638')] [2024-09-01 17:43:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8441856. Throughput: 0: 227.4. Samples: 110054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:43:09,081][00194] Avg episode reward: [(0, '27.944')] [2024-09-01 17:43:14,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8450048. Throughput: 0: 233.7. Samples: 110660. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:43:14,076][00194] Avg episode reward: [(0, '27.805')] [2024-09-01 17:43:19,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8454144. Throughput: 0: 237.1. Samples: 112112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:43:19,078][00194] Avg episode reward: [(0, '27.478')] [2024-09-01 17:43:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8458240. Throughput: 0: 224.4. Samples: 113140. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:43:24,081][00194] Avg episode reward: [(0, '27.109')] [2024-09-01 17:43:27,810][47741] Updated weights for policy 0, policy_version 2066 (0.0994) [2024-09-01 17:43:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 8462336. Throughput: 0: 227.8. Samples: 114132. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:43:29,076][00194] Avg episode reward: [(0, '28.299')] [2024-09-01 17:43:34,079][00194] Fps is (10 sec: 818.7, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 8466432. Throughput: 0: 238.6. Samples: 115636. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:43:34,084][00194] Avg episode reward: [(0, '28.297')] [2024-09-01 17:43:39,078][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 8470528. Throughput: 0: 231.7. Samples: 116968. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:43:39,083][00194] Avg episode reward: [(0, '28.267')] [2024-09-01 17:43:44,073][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8474624. Throughput: 0: 222.4. Samples: 117278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:43:44,080][00194] Avg episode reward: [(0, '28.089')] [2024-09-01 17:43:49,073][00194] Fps is (10 sec: 1229.3, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8482816. Throughput: 0: 237.7. Samples: 119232. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:43:49,082][00194] Avg episode reward: [(0, '27.878')] [2024-09-01 17:43:54,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8486912. Throughput: 0: 231.0. Samples: 120448. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:43:54,081][00194] Avg episode reward: [(0, '27.970')] [2024-09-01 17:43:59,075][00194] Fps is (10 sec: 819.0, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8491008. Throughput: 0: 233.6. Samples: 121174. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:43:59,088][00194] Avg episode reward: [(0, '27.649')] [2024-09-01 17:44:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8495104. Throughput: 0: 227.6. Samples: 122354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:44:04,081][00194] Avg episode reward: [(0, '27.529')] [2024-09-01 17:44:09,073][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8499200. Throughput: 0: 246.3. Samples: 124224. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:44:09,082][00194] Avg episode reward: [(0, '27.160')] [2024-09-01 17:44:10,731][47741] Updated weights for policy 0, policy_version 2076 (0.1024) [2024-09-01 17:44:14,078][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 8503296. Throughput: 0: 234.5. Samples: 124684. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:44:14,085][00194] Avg episode reward: [(0, '27.277')] [2024-09-01 17:44:19,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8507392. Throughput: 0: 222.7. Samples: 125656. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:19,076][00194] Avg episode reward: [(0, '27.712')] [2024-09-01 17:44:24,073][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8511488. Throughput: 0: 234.9. Samples: 127538. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:24,083][00194] Avg episode reward: [(0, '28.255')] [2024-09-01 17:44:29,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8519680. Throughput: 0: 245.5. Samples: 128324. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:44:29,078][00194] Avg episode reward: [(0, '28.103')] [2024-09-01 17:44:34,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 8523776. Throughput: 0: 229.8. Samples: 129572. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:34,079][00194] Avg episode reward: [(0, '28.428')] [2024-09-01 17:44:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 8527872. Throughput: 0: 228.4. Samples: 130728. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:39,076][00194] Avg episode reward: [(0, '28.993')] [2024-09-01 17:44:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8531968. Throughput: 0: 231.5. Samples: 131592. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:44,080][00194] Avg episode reward: [(0, '29.369')] [2024-09-01 17:44:45,596][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002084_8536064.pth... [2024-09-01 17:44:45,702][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002030_8314880.pth [2024-09-01 17:44:49,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8536064. Throughput: 0: 242.1. Samples: 133248. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:49,077][00194] Avg episode reward: [(0, '29.197')] [2024-09-01 17:44:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8540160. Throughput: 0: 222.0. Samples: 134216. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:54,079][00194] Avg episode reward: [(0, '29.067')] [2024-09-01 17:44:56,478][47741] Updated weights for policy 0, policy_version 2086 (0.2080) [2024-09-01 17:44:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8544256. Throughput: 0: 221.9. Samples: 134670. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:44:59,077][00194] Avg episode reward: [(0, '28.834')] [2024-09-01 17:45:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8552448. Throughput: 0: 244.7. Samples: 136668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:04,082][00194] Avg episode reward: [(0, '28.215')] [2024-09-01 17:45:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8556544. Throughput: 0: 225.3. Samples: 137676. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:09,078][00194] Avg episode reward: [(0, '28.374')] [2024-09-01 17:45:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 8560640. Throughput: 0: 222.1. Samples: 138320. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:14,075][00194] Avg episode reward: [(0, '29.109')] [2024-09-01 17:45:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8564736. Throughput: 0: 225.9. Samples: 139738. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:19,084][00194] Avg episode reward: [(0, '29.334')] [2024-09-01 17:45:24,079][00194] Fps is (10 sec: 818.8, 60 sec: 955.6, 300 sec: 930.3). Total num frames: 8568832. Throughput: 0: 241.7. Samples: 141606. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:24,089][00194] Avg episode reward: [(0, '30.070')] [2024-09-01 17:45:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8572928. Throughput: 0: 228.0. Samples: 141850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:45:29,076][00194] Avg episode reward: [(0, '29.908')] [2024-09-01 17:45:34,073][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8577024. Throughput: 0: 221.6. Samples: 143218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:45:34,082][00194] Avg episode reward: [(0, '29.328')] [2024-09-01 17:45:39,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8585216. Throughput: 0: 236.2. Samples: 144844. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:45:39,076][00194] Avg episode reward: [(0, '29.064')] [2024-09-01 17:45:39,505][47741] Updated weights for policy 0, policy_version 2096 (0.1945) [2024-09-01 17:45:44,075][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8589312. Throughput: 0: 243.1. Samples: 145608. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:45:44,080][00194] Avg episode reward: [(0, '29.126')] [2024-09-01 17:45:49,073][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8589312. Throughput: 0: 224.9. Samples: 146790. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:45:49,084][00194] Avg episode reward: [(0, '29.148')] [2024-09-01 17:45:54,076][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8597504. Throughput: 0: 233.8. Samples: 148198. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:54,087][00194] Avg episode reward: [(0, '29.209')] [2024-09-01 17:45:59,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8601600. Throughput: 0: 236.6. Samples: 148968. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:45:59,076][00194] Avg episode reward: [(0, '29.140')] [2024-09-01 17:46:04,073][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8605696. Throughput: 0: 234.8. Samples: 150304. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:04,082][00194] Avg episode reward: [(0, '28.622')] [2024-09-01 17:46:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8609792. Throughput: 0: 219.6. Samples: 151488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:09,075][00194] Avg episode reward: [(0, '28.744')] [2024-09-01 17:46:14,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8613888. Throughput: 0: 230.2. Samples: 152210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:14,076][00194] Avg episode reward: [(0, '29.571')] [2024-09-01 17:46:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8617984. Throughput: 0: 242.9. Samples: 154148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:19,076][00194] Avg episode reward: [(0, '28.882')] [2024-09-01 17:46:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8622080. Throughput: 0: 228.8. Samples: 155138. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:24,083][00194] Avg episode reward: [(0, '28.514')] [2024-09-01 17:46:25,234][47741] Updated weights for policy 0, policy_version 2106 (0.1972) [2024-09-01 17:46:27,556][47728] Signal inference workers to stop experience collection... (150 times) [2024-09-01 17:46:27,596][47741] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 17:46:28,461][47728] Signal inference workers to resume experience collection... (150 times) [2024-09-01 17:46:28,462][47741] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 17:46:29,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8630272. Throughput: 0: 225.1. Samples: 155736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:29,082][00194] Avg episode reward: [(0, '28.457')] [2024-09-01 17:46:34,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8634368. Throughput: 0: 235.5. Samples: 157388. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:34,078][00194] Avg episode reward: [(0, '28.078')] [2024-09-01 17:46:39,079][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 930.3). Total num frames: 8638464. Throughput: 0: 229.1. Samples: 158510. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:39,082][00194] Avg episode reward: [(0, '28.232')] [2024-09-01 17:46:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8642560. Throughput: 0: 227.2. Samples: 159194. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:44,076][00194] Avg episode reward: [(0, '27.772')] [2024-09-01 17:46:45,914][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002111_8646656.pth... [2024-09-01 17:46:46,010][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002057_8425472.pth [2024-09-01 17:46:49,073][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8646656. Throughput: 0: 236.6. Samples: 160950. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:49,082][00194] Avg episode reward: [(0, '27.768')] [2024-09-01 17:46:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8650752. Throughput: 0: 239.2. Samples: 162254. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:54,080][00194] Avg episode reward: [(0, '27.986')] [2024-09-01 17:46:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8654848. Throughput: 0: 236.0. Samples: 162828. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:46:59,076][00194] Avg episode reward: [(0, '28.048')] [2024-09-01 17:47:04,075][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8663040. Throughput: 0: 226.7. Samples: 164350. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:04,084][00194] Avg episode reward: [(0, '28.526')] [2024-09-01 17:47:07,711][47741] Updated weights for policy 0, policy_version 2116 (0.1196) [2024-09-01 17:47:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8667136. Throughput: 0: 235.0. Samples: 165714. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:09,081][00194] Avg episode reward: [(0, '28.320')] [2024-09-01 17:47:14,073][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8671232. Throughput: 0: 236.3. Samples: 166370. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:14,078][00194] Avg episode reward: [(0, '28.787')] [2024-09-01 17:47:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8675328. Throughput: 0: 222.0. Samples: 167378. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:19,076][00194] Avg episode reward: [(0, '27.880')] [2024-09-01 17:47:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8679424. Throughput: 0: 240.2. Samples: 169318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:24,078][00194] Avg episode reward: [(0, '27.446')] [2024-09-01 17:47:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8683520. Throughput: 0: 235.2. Samples: 169776. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:29,080][00194] Avg episode reward: [(0, '27.774')] [2024-09-01 17:47:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8687616. Throughput: 0: 226.0. Samples: 171122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:34,076][00194] Avg episode reward: [(0, '27.756')] [2024-09-01 17:47:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 8691712. Throughput: 0: 229.8. Samples: 172596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:39,086][00194] Avg episode reward: [(0, '27.583')] [2024-09-01 17:47:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8699904. Throughput: 0: 232.7. Samples: 173300. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:44,076][00194] Avg episode reward: [(0, '27.100')] [2024-09-01 17:47:49,074][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8704000. Throughput: 0: 228.0. Samples: 174612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:47:49,081][00194] Avg episode reward: [(0, '26.435')] [2024-09-01 17:47:53,501][47741] Updated weights for policy 0, policy_version 2126 (0.1473) [2024-09-01 17:47:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8708096. Throughput: 0: 221.4. Samples: 175676. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:47:54,080][00194] Avg episode reward: [(0, '26.903')] [2024-09-01 17:47:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8712192. Throughput: 0: 228.8. Samples: 176666. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:47:59,081][00194] Avg episode reward: [(0, '25.674')] [2024-09-01 17:48:04,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8716288. Throughput: 0: 244.0. Samples: 178360. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:48:04,076][00194] Avg episode reward: [(0, '26.020')] [2024-09-01 17:48:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8720384. Throughput: 0: 227.0. Samples: 179532. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:48:09,076][00194] Avg episode reward: [(0, '26.615')] [2024-09-01 17:48:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8724480. Throughput: 0: 224.8. Samples: 179894. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:48:14,082][00194] Avg episode reward: [(0, '27.573')] [2024-09-01 17:48:19,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8732672. Throughput: 0: 239.5. Samples: 181900. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:48:19,079][00194] Avg episode reward: [(0, '27.229')] [2024-09-01 17:48:24,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8736768. Throughput: 0: 232.2. Samples: 183044. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:48:24,078][00194] Avg episode reward: [(0, '27.459')] [2024-09-01 17:48:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8740864. Throughput: 0: 232.8. Samples: 183776. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 17:48:29,079][00194] Avg episode reward: [(0, '27.559')] [2024-09-01 17:48:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8744960. Throughput: 0: 230.8. Samples: 184996. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:48:34,077][00194] Avg episode reward: [(0, '27.768')] [2024-09-01 17:48:37,069][47741] Updated weights for policy 0, policy_version 2136 (0.1021) [2024-09-01 17:48:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8749056. Throughput: 0: 244.8. Samples: 186692. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:48:39,078][00194] Avg episode reward: [(0, '27.537')] [2024-09-01 17:48:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8753152. Throughput: 0: 235.1. Samples: 187244. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:48:44,080][00194] Avg episode reward: [(0, '27.537')] [2024-09-01 17:48:46,362][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002138_8757248.pth... [2024-09-01 17:48:46,512][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002084_8536064.pth [2024-09-01 17:48:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8757248. Throughput: 0: 221.6. Samples: 188332. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 17:48:49,076][00194] Avg episode reward: [(0, '27.440')] [2024-09-01 17:48:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8761344. Throughput: 0: 234.0. Samples: 190064. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:48:54,083][00194] Avg episode reward: [(0, '27.980')] [2024-09-01 17:48:59,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8769536. Throughput: 0: 246.8. Samples: 191000. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:48:59,077][00194] Avg episode reward: [(0, '28.400')] [2024-09-01 17:49:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8769536. Throughput: 0: 223.1. Samples: 191940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:49:04,081][00194] Avg episode reward: [(0, '28.569')] [2024-09-01 17:49:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8777728. Throughput: 0: 225.1. Samples: 193174. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:49:09,080][00194] Avg episode reward: [(0, '28.653')] [2024-09-01 17:49:14,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8781824. Throughput: 0: 230.7. Samples: 194156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:49:14,076][00194] Avg episode reward: [(0, '28.217')] [2024-09-01 17:49:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8785920. Throughput: 0: 236.4. Samples: 195632. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:19,081][00194] Avg episode reward: [(0, '28.416')] [2024-09-01 17:49:21,535][47741] Updated weights for policy 0, policy_version 2146 (0.2257) [2024-09-01 17:49:24,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8790016. Throughput: 0: 223.9. Samples: 196766. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:24,077][00194] Avg episode reward: [(0, '28.271')] [2024-09-01 17:49:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8794112. Throughput: 0: 228.2. Samples: 197512. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:29,081][00194] Avg episode reward: [(0, '27.301')] [2024-09-01 17:49:34,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8802304. Throughput: 0: 243.7. Samples: 199300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:34,079][00194] Avg episode reward: [(0, '26.947')] [2024-09-01 17:49:39,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8806400. Throughput: 0: 226.1. Samples: 200240. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:39,079][00194] Avg episode reward: [(0, '26.947')] [2024-09-01 17:49:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8810496. Throughput: 0: 221.2. Samples: 200954. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:44,076][00194] Avg episode reward: [(0, '27.199')] [2024-09-01 17:49:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8814592. Throughput: 0: 234.3. Samples: 202482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:49,076][00194] Avg episode reward: [(0, '27.960')] [2024-09-01 17:49:54,077][00194] Fps is (10 sec: 818.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8818688. Throughput: 0: 237.5. Samples: 203862. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:54,085][00194] Avg episode reward: [(0, '28.228')] [2024-09-01 17:49:59,078][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 8822784. Throughput: 0: 227.4. Samples: 204390. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:49:59,082][00194] Avg episode reward: [(0, '28.568')] [2024-09-01 17:50:04,073][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 8826880. Throughput: 0: 222.7. Samples: 205654. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:04,079][00194] Avg episode reward: [(0, '29.417')] [2024-09-01 17:50:05,233][47741] Updated weights for policy 0, policy_version 2156 (0.1494) [2024-09-01 17:50:07,590][47728] Signal inference workers to stop experience collection... (200 times) [2024-09-01 17:50:07,690][47741] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 17:50:09,055][47728] Signal inference workers to resume experience collection... (200 times) [2024-09-01 17:50:09,057][47741] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 17:50:09,073][00194] Fps is (10 sec: 1229.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8835072. Throughput: 0: 237.3. Samples: 207442. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:09,076][00194] Avg episode reward: [(0, '29.478')] [2024-09-01 17:50:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8835072. Throughput: 0: 238.4. Samples: 208240. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:14,077][00194] Avg episode reward: [(0, '28.928')] [2024-09-01 17:50:19,073][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8839168. Throughput: 0: 221.2. Samples: 209254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:19,082][00194] Avg episode reward: [(0, '28.281')] [2024-09-01 17:50:24,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 8847360. Throughput: 0: 234.2. Samples: 210780. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:24,080][00194] Avg episode reward: [(0, '28.533')] [2024-09-01 17:50:29,075][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8851456. Throughput: 0: 234.6. Samples: 211510. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:29,081][00194] Avg episode reward: [(0, '28.379')] [2024-09-01 17:50:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8855552. Throughput: 0: 226.9. Samples: 212694. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:34,080][00194] Avg episode reward: [(0, '28.409')] [2024-09-01 17:50:39,073][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8859648. Throughput: 0: 231.6. Samples: 214282. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:39,078][00194] Avg episode reward: [(0, '27.912')] [2024-09-01 17:50:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8863744. Throughput: 0: 235.4. Samples: 214984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:44,081][00194] Avg episode reward: [(0, '28.844')] [2024-09-01 17:50:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8867840. Throughput: 0: 244.4. Samples: 216652. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:49,076][00194] Avg episode reward: [(0, '28.827')] [2024-09-01 17:50:49,466][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002166_8871936.pth... [2024-09-01 17:50:49,470][47741] Updated weights for policy 0, policy_version 2166 (0.0041) [2024-09-01 17:50:49,639][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002111_8646656.pth [2024-09-01 17:50:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8871936. Throughput: 0: 226.8. Samples: 217646. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:54,076][00194] Avg episode reward: [(0, '29.740')] [2024-09-01 17:50:59,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 8880128. Throughput: 0: 226.8. Samples: 218448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:50:59,082][00194] Avg episode reward: [(0, '29.780')] [2024-09-01 17:51:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8884224. Throughput: 0: 236.1. Samples: 219880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:51:04,079][00194] Avg episode reward: [(0, '29.849')] [2024-09-01 17:51:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8888320. Throughput: 0: 227.7. Samples: 221028. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:51:09,076][00194] Avg episode reward: [(0, '29.272')] [2024-09-01 17:51:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8892416. Throughput: 0: 227.6. Samples: 221750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:51:14,082][00194] Avg episode reward: [(0, '28.903')] [2024-09-01 17:51:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8896512. Throughput: 0: 238.1. Samples: 223408. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:51:19,076][00194] Avg episode reward: [(0, '29.191')] [2024-09-01 17:51:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8900608. Throughput: 0: 234.0. Samples: 224814. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:51:24,076][00194] Avg episode reward: [(0, '29.591')] [2024-09-01 17:51:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8904704. Throughput: 0: 229.0. Samples: 225288. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:51:29,075][00194] Avg episode reward: [(0, '29.685')] [2024-09-01 17:51:34,074][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8912896. Throughput: 0: 226.5. Samples: 226844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:51:34,082][00194] Avg episode reward: [(0, '29.773')] [2024-09-01 17:51:34,477][47741] Updated weights for policy 0, policy_version 2176 (0.1454) [2024-09-01 17:51:39,074][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8916992. Throughput: 0: 236.5. Samples: 228290. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:51:39,078][00194] Avg episode reward: [(0, '30.039')] [2024-09-01 17:51:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8921088. Throughput: 0: 232.8. Samples: 228922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:51:44,079][00194] Avg episode reward: [(0, '30.317')] [2024-09-01 17:51:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8925184. Throughput: 0: 223.9. Samples: 229954. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:51:49,083][00194] Avg episode reward: [(0, '30.376')] [2024-09-01 17:51:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8929280. Throughput: 0: 234.3. Samples: 231572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:51:54,076][00194] Avg episode reward: [(0, '30.311')] [2024-09-01 17:51:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8933376. Throughput: 0: 229.8. Samples: 232090. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:51:59,078][00194] Avg episode reward: [(0, '30.406')] [2024-09-01 17:52:04,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8937472. Throughput: 0: 222.9. Samples: 233440. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:52:04,084][00194] Avg episode reward: [(0, '30.699')] [2024-09-01 17:52:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8941568. Throughput: 0: 223.0. Samples: 234848. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:52:09,084][00194] Avg episode reward: [(0, '30.460')] [2024-09-01 17:52:14,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8945664. Throughput: 0: 222.3. Samples: 235290. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 17:52:14,079][00194] Avg episode reward: [(0, '30.489')] [2024-09-01 17:52:18,948][47741] Updated weights for policy 0, policy_version 2186 (0.1046) [2024-09-01 17:52:19,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8953856. Throughput: 0: 228.2. Samples: 237114. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:52:19,085][00194] Avg episode reward: [(0, '30.118')] [2024-09-01 17:52:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8953856. Throughput: 0: 217.3. Samples: 238070. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:52:24,079][00194] Avg episode reward: [(0, '30.606')] [2024-09-01 17:52:29,074][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8962048. Throughput: 0: 222.6. Samples: 238938. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 17:52:29,085][00194] Avg episode reward: [(0, '30.184')] [2024-09-01 17:52:34,073][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 930.3). Total num frames: 8966144. Throughput: 0: 226.7. Samples: 240156. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:52:34,076][00194] Avg episode reward: [(0, '30.236')] [2024-09-01 17:52:39,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8970240. Throughput: 0: 222.0. Samples: 241562. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:52:39,081][00194] Avg episode reward: [(0, '30.433')] [2024-09-01 17:52:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8974336. Throughput: 0: 222.8. Samples: 242118. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:52:44,076][00194] Avg episode reward: [(0, '30.482')] [2024-09-01 17:52:46,236][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002192_8978432.pth... [2024-09-01 17:52:46,358][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002138_8757248.pth [2024-09-01 17:52:49,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8978432. Throughput: 0: 228.4. Samples: 243720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:52:49,079][00194] Avg episode reward: [(0, '30.506')] [2024-09-01 17:52:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8982528. Throughput: 0: 230.7. Samples: 245228. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:52:54,076][00194] Avg episode reward: [(0, '29.454')] [2024-09-01 17:52:59,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8986624. Throughput: 0: 230.8. Samples: 245676. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:52:59,080][00194] Avg episode reward: [(0, '29.401')] [2024-09-01 17:53:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 8990720. Throughput: 0: 222.1. Samples: 247108. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:04,083][00194] Avg episode reward: [(0, '29.413')] [2024-09-01 17:53:04,385][47741] Updated weights for policy 0, policy_version 2196 (0.0984) [2024-09-01 17:53:09,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 8998912. Throughput: 0: 232.1. Samples: 248514. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:09,083][00194] Avg episode reward: [(0, '29.376')] [2024-09-01 17:53:14,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9003008. Throughput: 0: 230.3. Samples: 249302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:14,079][00194] Avg episode reward: [(0, '28.254')] [2024-09-01 17:53:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9007104. Throughput: 0: 226.1. Samples: 250332. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:19,084][00194] Avg episode reward: [(0, '28.653')] [2024-09-01 17:53:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9011200. Throughput: 0: 233.2. Samples: 252054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:24,076][00194] Avg episode reward: [(0, '28.125')] [2024-09-01 17:53:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9015296. Throughput: 0: 236.8. Samples: 252772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:53:29,076][00194] Avg episode reward: [(0, '27.323')] [2024-09-01 17:53:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9019392. Throughput: 0: 229.4. Samples: 254044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:53:34,080][00194] Avg episode reward: [(0, '27.434')] [2024-09-01 17:53:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9023488. Throughput: 0: 225.7. Samples: 255384. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:39,083][00194] Avg episode reward: [(0, '27.793')] [2024-09-01 17:53:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9031680. Throughput: 0: 235.4. Samples: 256270. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:44,079][00194] Avg episode reward: [(0, '28.300')] [2024-09-01 17:53:47,791][47741] Updated weights for policy 0, policy_version 2206 (0.0527) [2024-09-01 17:53:49,075][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9035776. Throughput: 0: 231.1. Samples: 257510. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:49,079][00194] Avg episode reward: [(0, '28.630')] [2024-09-01 17:53:51,967][47728] Signal inference workers to stop experience collection... (250 times) [2024-09-01 17:53:52,041][47741] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 17:53:53,579][47728] Signal inference workers to resume experience collection... (250 times) [2024-09-01 17:53:53,581][47741] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 17:53:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9039872. Throughput: 0: 222.8. Samples: 258538. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:54,076][00194] Avg episode reward: [(0, '28.255')] [2024-09-01 17:53:59,075][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9043968. Throughput: 0: 228.3. Samples: 259578. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:53:59,078][00194] Avg episode reward: [(0, '28.007')] [2024-09-01 17:54:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9048064. Throughput: 0: 241.5. Samples: 261200. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:54:04,078][00194] Avg episode reward: [(0, '27.672')] [2024-09-01 17:54:09,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9052160. Throughput: 0: 224.9. Samples: 262176. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:54:09,078][00194] Avg episode reward: [(0, '27.186')] [2024-09-01 17:54:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9056256. Throughput: 0: 221.8. Samples: 262752. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:54:14,076][00194] Avg episode reward: [(0, '27.156')] [2024-09-01 17:54:19,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9064448. Throughput: 0: 236.8. Samples: 264700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:54:19,076][00194] Avg episode reward: [(0, '26.479')] [2024-09-01 17:54:24,076][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9068544. Throughput: 0: 232.4. Samples: 265844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:54:24,080][00194] Avg episode reward: [(0, '25.856')] [2024-09-01 17:54:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9072640. Throughput: 0: 228.8. Samples: 266564. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:54:29,083][00194] Avg episode reward: [(0, '26.118')] [2024-09-01 17:54:33,247][47741] Updated weights for policy 0, policy_version 2216 (0.1225) [2024-09-01 17:54:34,073][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9076736. Throughput: 0: 227.7. Samples: 267758. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:54:34,081][00194] Avg episode reward: [(0, '26.107')] [2024-09-01 17:54:39,075][00194] Fps is (10 sec: 819.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9080832. Throughput: 0: 243.5. Samples: 269496. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:54:39,078][00194] Avg episode reward: [(0, '25.984')] [2024-09-01 17:54:44,079][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 9084928. Throughput: 0: 231.4. Samples: 269990. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:54:44,087][00194] Avg episode reward: [(0, '25.637')] [2024-09-01 17:54:46,478][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002219_9089024.pth... [2024-09-01 17:54:46,594][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002166_8871936.pth [2024-09-01 17:54:49,073][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9089024. Throughput: 0: 220.0. Samples: 271102. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:54:49,078][00194] Avg episode reward: [(0, '26.531')] [2024-09-01 17:54:54,073][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9093120. Throughput: 0: 235.7. Samples: 272782. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:54:54,075][00194] Avg episode reward: [(0, '26.678')] [2024-09-01 17:54:59,076][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9101312. Throughput: 0: 245.1. Samples: 273780. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:54:59,081][00194] Avg episode reward: [(0, '26.632')] [2024-09-01 17:55:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9101312. Throughput: 0: 224.0. Samples: 274778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:55:04,079][00194] Avg episode reward: [(0, '26.347')] [2024-09-01 17:55:09,074][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9109504. Throughput: 0: 224.2. Samples: 275932. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:55:09,081][00194] Avg episode reward: [(0, '26.557')] [2024-09-01 17:55:14,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9113600. Throughput: 0: 230.9. Samples: 276956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:55:14,076][00194] Avg episode reward: [(0, '26.815')] [2024-09-01 17:55:17,193][47741] Updated weights for policy 0, policy_version 2226 (0.1650) [2024-09-01 17:55:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9117696. Throughput: 0: 230.7. Samples: 278140. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:55:19,082][00194] Avg episode reward: [(0, '26.721')] [2024-09-01 17:55:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9121792. Throughput: 0: 219.4. Samples: 279370. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:55:24,083][00194] Avg episode reward: [(0, '27.443')] [2024-09-01 17:55:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9125888. Throughput: 0: 225.0. Samples: 280112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:55:29,078][00194] Avg episode reward: [(0, '27.041')] [2024-09-01 17:55:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9129984. Throughput: 0: 239.2. Samples: 281864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:55:34,076][00194] Avg episode reward: [(0, '26.704')] [2024-09-01 17:55:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9134080. Throughput: 0: 230.0. Samples: 283134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:55:39,080][00194] Avg episode reward: [(0, '25.751')] [2024-09-01 17:55:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 9138176. Throughput: 0: 217.9. Samples: 283584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:55:44,079][00194] Avg episode reward: [(0, '25.506')] [2024-09-01 17:55:49,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9146368. Throughput: 0: 231.1. Samples: 285176. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:55:49,082][00194] Avg episode reward: [(0, '25.400')] [2024-09-01 17:55:54,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9150464. Throughput: 0: 237.2. Samples: 286606. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:55:54,078][00194] Avg episode reward: [(0, '25.441')] [2024-09-01 17:55:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9154560. Throughput: 0: 228.1. Samples: 287220. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:55:59,076][00194] Avg episode reward: [(0, '25.553')] [2024-09-01 17:56:02,337][47741] Updated weights for policy 0, policy_version 2236 (0.2024) [2024-09-01 17:56:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9158656. Throughput: 0: 227.6. Samples: 288380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:04,075][00194] Avg episode reward: [(0, '25.467')] [2024-09-01 17:56:09,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9162752. Throughput: 0: 242.8. Samples: 290294. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:09,077][00194] Avg episode reward: [(0, '25.422')] [2024-09-01 17:56:14,079][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 9166848. Throughput: 0: 235.7. Samples: 290720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:14,087][00194] Avg episode reward: [(0, '25.323')] [2024-09-01 17:56:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9170944. Throughput: 0: 224.1. Samples: 291948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:19,080][00194] Avg episode reward: [(0, '25.586')] [2024-09-01 17:56:24,073][00194] Fps is (10 sec: 1229.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9179136. Throughput: 0: 229.2. Samples: 293450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:24,078][00194] Avg episode reward: [(0, '25.686')] [2024-09-01 17:56:29,075][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9183232. Throughput: 0: 240.2. Samples: 294394. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:29,078][00194] Avg episode reward: [(0, '26.265')] [2024-09-01 17:56:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9187328. Throughput: 0: 227.3. Samples: 295404. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:34,077][00194] Avg episode reward: [(0, '26.472')] [2024-09-01 17:56:39,073][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9191424. Throughput: 0: 224.2. Samples: 296696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:39,077][00194] Avg episode reward: [(0, '26.659')] [2024-09-01 17:56:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9195520. Throughput: 0: 230.7. Samples: 297602. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:44,081][00194] Avg episode reward: [(0, '26.592')] [2024-09-01 17:56:44,937][47741] Updated weights for policy 0, policy_version 2246 (0.1615) [2024-09-01 17:56:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9199616. Throughput: 0: 238.1. Samples: 299094. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:49,076][00194] Avg episode reward: [(0, '26.841')] [2024-09-01 17:56:50,624][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002247_9203712.pth... [2024-09-01 17:56:50,754][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002192_8978432.pth [2024-09-01 17:56:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9203712. Throughput: 0: 218.9. Samples: 300144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:54,076][00194] Avg episode reward: [(0, '26.851')] [2024-09-01 17:56:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9207808. Throughput: 0: 226.5. Samples: 300910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:56:59,082][00194] Avg episode reward: [(0, '26.917')] [2024-09-01 17:57:04,075][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9216000. Throughput: 0: 233.8. Samples: 302468. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:04,079][00194] Avg episode reward: [(0, '27.535')] [2024-09-01 17:57:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9220096. Throughput: 0: 223.3. Samples: 303500. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:09,080][00194] Avg episode reward: [(0, '27.383')] [2024-09-01 17:57:14,073][00194] Fps is (10 sec: 819.4, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 9224192. Throughput: 0: 218.6. Samples: 304230. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:14,084][00194] Avg episode reward: [(0, '27.213')] [2024-09-01 17:57:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9228288. Throughput: 0: 227.6. Samples: 305646. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:19,084][00194] Avg episode reward: [(0, '26.823')] [2024-09-01 17:57:24,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9232384. Throughput: 0: 239.4. Samples: 307468. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:24,083][00194] Avg episode reward: [(0, '26.843')] [2024-09-01 17:57:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9236480. Throughput: 0: 226.6. Samples: 307798. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:29,089][00194] Avg episode reward: [(0, '26.074')] [2024-09-01 17:57:31,917][47741] Updated weights for policy 0, policy_version 2256 (0.1984) [2024-09-01 17:57:34,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9240576. Throughput: 0: 219.9. Samples: 308990. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:57:34,076][00194] Avg episode reward: [(0, '26.646')] [2024-09-01 17:57:34,305][47728] Signal inference workers to stop experience collection... (300 times) [2024-09-01 17:57:34,358][47741] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 17:57:35,270][47728] Signal inference workers to resume experience collection... (300 times) [2024-09-01 17:57:35,271][47741] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 17:57:39,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9248768. Throughput: 0: 235.0. Samples: 310720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:57:39,075][00194] Avg episode reward: [(0, '26.778')] [2024-09-01 17:57:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9248768. Throughput: 0: 236.4. Samples: 311550. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:57:44,078][00194] Avg episode reward: [(0, '26.796')] [2024-09-01 17:57:49,073][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9252864. Throughput: 0: 221.5. Samples: 312434. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 17:57:49,080][00194] Avg episode reward: [(0, '27.605')] [2024-09-01 17:57:54,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9261056. Throughput: 0: 235.6. Samples: 314100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:57:54,081][00194] Avg episode reward: [(0, '27.461')] [2024-09-01 17:57:59,075][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9265152. Throughput: 0: 236.3. Samples: 314866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:57:59,086][00194] Avg episode reward: [(0, '28.027')] [2024-09-01 17:58:04,080][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 9269248. Throughput: 0: 232.0. Samples: 316086. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:58:04,089][00194] Avg episode reward: [(0, '27.889')] [2024-09-01 17:58:09,076][00194] Fps is (10 sec: 819.2, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 9273344. Throughput: 0: 218.7. Samples: 317310. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 17:58:09,082][00194] Avg episode reward: [(0, '27.408')] [2024-09-01 17:58:14,073][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9277440. Throughput: 0: 231.0. Samples: 318194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:58:14,076][00194] Avg episode reward: [(0, '28.402')] [2024-09-01 17:58:14,541][47741] Updated weights for policy 0, policy_version 2266 (0.2102) [2024-09-01 17:58:19,073][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9281536. Throughput: 0: 242.9. Samples: 319920. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:58:19,076][00194] Avg episode reward: [(0, '29.030')] [2024-09-01 17:58:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9285632. Throughput: 0: 226.8. Samples: 320926. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:58:24,084][00194] Avg episode reward: [(0, '29.274')] [2024-09-01 17:58:29,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9293824. Throughput: 0: 223.5. Samples: 321606. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:58:29,075][00194] Avg episode reward: [(0, '29.318')] [2024-09-01 17:58:34,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9297920. Throughput: 0: 235.6. Samples: 323036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:58:34,077][00194] Avg episode reward: [(0, '29.675')] [2024-09-01 17:58:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9302016. Throughput: 0: 227.6. Samples: 324340. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:58:39,081][00194] Avg episode reward: [(0, '29.264')] [2024-09-01 17:58:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9306112. Throughput: 0: 223.6. Samples: 324926. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 17:58:44,077][00194] Avg episode reward: [(0, '28.850')] [2024-09-01 17:58:46,026][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002273_9310208.pth... [2024-09-01 17:58:46,130][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002219_9089024.pth [2024-09-01 17:58:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9310208. Throughput: 0: 233.7. Samples: 326602. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 17:58:49,081][00194] Avg episode reward: [(0, '29.030')] [2024-09-01 17:58:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9314304. Throughput: 0: 236.4. Samples: 327948. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 17:58:54,076][00194] Avg episode reward: [(0, '28.378')] [2024-09-01 17:58:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9318400. Throughput: 0: 229.5. Samples: 328522. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:58:59,076][00194] Avg episode reward: [(0, '28.371')] [2024-09-01 17:59:00,085][47741] Updated weights for policy 0, policy_version 2276 (0.0995) [2024-09-01 17:59:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 9326592. Throughput: 0: 225.4. Samples: 330062. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:04,081][00194] Avg episode reward: [(0, '27.959')] [2024-09-01 17:59:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 9330688. Throughput: 0: 234.5. Samples: 331480. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:09,075][00194] Avg episode reward: [(0, '27.625')] [2024-09-01 17:59:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9334784. Throughput: 0: 234.6. Samples: 332162. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:14,078][00194] Avg episode reward: [(0, '27.248')] [2024-09-01 17:59:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9338880. Throughput: 0: 224.4. Samples: 333134. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:19,078][00194] Avg episode reward: [(0, '27.081')] [2024-09-01 17:59:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9342976. Throughput: 0: 238.6. Samples: 335078. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:59:24,076][00194] Avg episode reward: [(0, '27.300')] [2024-09-01 17:59:29,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9347072. Throughput: 0: 237.2. Samples: 335602. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 17:59:29,077][00194] Avg episode reward: [(0, '27.668')] [2024-09-01 17:59:34,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9351168. Throughput: 0: 230.2. Samples: 336962. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:34,077][00194] Avg episode reward: [(0, '27.926')] [2024-09-01 17:59:39,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9355264. Throughput: 0: 229.1. Samples: 338256. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:39,083][00194] Avg episode reward: [(0, '28.114')] [2024-09-01 17:59:42,981][47741] Updated weights for policy 0, policy_version 2286 (0.1024) [2024-09-01 17:59:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9363456. Throughput: 0: 239.0. Samples: 339276. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:44,083][00194] Avg episode reward: [(0, '27.599')] [2024-09-01 17:59:49,077][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9367552. Throughput: 0: 229.8. Samples: 340404. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:49,081][00194] Avg episode reward: [(0, '27.545')] [2024-09-01 17:59:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9371648. Throughput: 0: 222.0. Samples: 341470. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:54,081][00194] Avg episode reward: [(0, '27.211')] [2024-09-01 17:59:59,073][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9375744. Throughput: 0: 225.5. Samples: 342310. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 17:59:59,082][00194] Avg episode reward: [(0, '27.063')] [2024-09-01 18:00:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9379840. Throughput: 0: 239.6. Samples: 343914. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:04,076][00194] Avg episode reward: [(0, '27.299')] [2024-09-01 18:00:09,077][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 9383936. Throughput: 0: 225.4. Samples: 345220. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:09,088][00194] Avg episode reward: [(0, '27.322')] [2024-09-01 18:00:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9388032. Throughput: 0: 221.6. Samples: 345574. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:14,082][00194] Avg episode reward: [(0, '27.807')] [2024-09-01 18:00:19,073][00194] Fps is (10 sec: 1229.3, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9396224. Throughput: 0: 231.0. Samples: 347358. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 18:00:19,081][00194] Avg episode reward: [(0, '27.612')] [2024-09-01 18:00:24,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9400320. Throughput: 0: 229.0. Samples: 348560. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 18:00:24,079][00194] Avg episode reward: [(0, '27.614')] [2024-09-01 18:00:28,520][47741] Updated weights for policy 0, policy_version 2296 (0.1670) [2024-09-01 18:00:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9404416. Throughput: 0: 222.4. Samples: 349284. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 18:00:29,078][00194] Avg episode reward: [(0, '27.375')] [2024-09-01 18:00:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9408512. Throughput: 0: 225.9. Samples: 350568. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:34,081][00194] Avg episode reward: [(0, '27.414')] [2024-09-01 18:00:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9412608. Throughput: 0: 240.0. Samples: 352270. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:39,082][00194] Avg episode reward: [(0, '27.178')] [2024-09-01 18:00:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9416704. Throughput: 0: 234.0. Samples: 352840. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 18:00:44,081][00194] Avg episode reward: [(0, '26.971')] [2024-09-01 18:00:46,282][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002300_9420800.pth... [2024-09-01 18:00:46,397][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002247_9203712.pth [2024-09-01 18:00:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9420800. Throughput: 0: 222.7. Samples: 353934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 18:00:49,075][00194] Avg episode reward: [(0, '27.308')] [2024-09-01 18:00:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9424896. Throughput: 0: 232.3. Samples: 355672. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:54,076][00194] Avg episode reward: [(0, '27.894')] [2024-09-01 18:00:59,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9433088. Throughput: 0: 245.1. Samples: 356604. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:00:59,076][00194] Avg episode reward: [(0, '27.773')] [2024-09-01 18:01:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9437184. Throughput: 0: 231.5. Samples: 357774. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:01:04,081][00194] Avg episode reward: [(0, '27.797')] [2024-09-01 18:01:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 9441280. Throughput: 0: 226.8. Samples: 358764. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:01:09,076][00194] Avg episode reward: [(0, '28.278')] [2024-09-01 18:01:12,428][47741] Updated weights for policy 0, policy_version 2306 (0.1038) [2024-09-01 18:01:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9445376. Throughput: 0: 233.8. Samples: 359806. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:01:14,078][00194] Avg episode reward: [(0, '27.950')] [2024-09-01 18:01:14,656][47728] Signal inference workers to stop experience collection... (350 times) [2024-09-01 18:01:14,691][47741] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 18:01:16,083][47728] Signal inference workers to resume experience collection... (350 times) [2024-09-01 18:01:16,085][47741] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 18:01:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9449472. Throughput: 0: 233.6. Samples: 361082. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:01:19,076][00194] Avg episode reward: [(0, '27.933')] [2024-09-01 18:01:24,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9453568. Throughput: 0: 224.1. Samples: 362356. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:01:24,078][00194] Avg episode reward: [(0, '27.933')] [2024-09-01 18:01:29,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9457664. Throughput: 0: 225.9. Samples: 363004. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 18:01:29,083][00194] Avg episode reward: [(0, '28.447')] [2024-09-01 18:01:34,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9465856. Throughput: 0: 246.0. Samples: 365006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:01:34,078][00194] Avg episode reward: [(0, '28.880')] [2024-09-01 18:01:39,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9465856. Throughput: 0: 231.2. Samples: 366074. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:01:39,077][00194] Avg episode reward: [(0, '29.244')] [2024-09-01 18:01:44,073][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9469952. Throughput: 0: 221.7. Samples: 366580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:01:44,076][00194] Avg episode reward: [(0, '29.575')] [2024-09-01 18:01:49,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9478144. Throughput: 0: 229.7. Samples: 368112. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:01:49,081][00194] Avg episode reward: [(0, '30.006')] [2024-09-01 18:01:54,081][00194] Fps is (10 sec: 1227.8, 60 sec: 955.6, 300 sec: 930.3). Total num frames: 9482240. Throughput: 0: 244.7. Samples: 369778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:01:54,086][00194] Avg episode reward: [(0, '30.527')] [2024-09-01 18:01:56,942][47741] Updated weights for policy 0, policy_version 2316 (0.1467) [2024-09-01 18:01:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9486336. Throughput: 0: 229.6. Samples: 370138. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:01:59,080][00194] Avg episode reward: [(0, '30.686')] [2024-09-01 18:02:04,073][00194] Fps is (10 sec: 819.9, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9490432. Throughput: 0: 230.4. Samples: 371450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:02:04,076][00194] Avg episode reward: [(0, '31.365')] [2024-09-01 18:02:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9494528. Throughput: 0: 241.0. Samples: 373202. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:02:09,082][00194] Avg episode reward: [(0, '31.352')] [2024-09-01 18:02:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9498624. Throughput: 0: 241.0. Samples: 373850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:02:14,076][00194] Avg episode reward: [(0, '32.387')] [2024-09-01 18:02:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9502720. Throughput: 0: 221.9. Samples: 374992. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:02:19,075][00194] Avg episode reward: [(0, '33.308')] [2024-09-01 18:02:23,747][47728] Saving new best policy, reward=33.308! [2024-09-01 18:02:24,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9510912. Throughput: 0: 229.1. Samples: 376384. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:02:24,076][00194] Avg episode reward: [(0, '32.880')] [2024-09-01 18:02:29,082][00194] Fps is (10 sec: 1227.8, 60 sec: 955.6, 300 sec: 930.3). Total num frames: 9515008. Throughput: 0: 239.0. Samples: 377338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:02:29,087][00194] Avg episode reward: [(0, '32.728')] [2024-09-01 18:02:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9519104. Throughput: 0: 227.9. Samples: 378366. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:02:34,076][00194] Avg episode reward: [(0, '32.728')] [2024-09-01 18:02:39,073][00194] Fps is (10 sec: 819.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9523200. Throughput: 0: 216.3. Samples: 379510. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:02:39,076][00194] Avg episode reward: [(0, '32.549')] [2024-09-01 18:02:41,630][47741] Updated weights for policy 0, policy_version 2326 (0.0521) [2024-09-01 18:02:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9527296. Throughput: 0: 228.6. Samples: 380426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:02:44,076][00194] Avg episode reward: [(0, '32.306')] [2024-09-01 18:02:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9531392. Throughput: 0: 241.6. Samples: 382324. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:02:49,076][00194] Avg episode reward: [(0, '32.847')] [2024-09-01 18:02:50,069][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002328_9535488.pth... [2024-09-01 18:02:50,201][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002273_9310208.pth [2024-09-01 18:02:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 9535488. Throughput: 0: 224.4. Samples: 383300. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:02:54,077][00194] Avg episode reward: [(0, '32.847')] [2024-09-01 18:02:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9539584. Throughput: 0: 218.2. Samples: 383670. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:02:59,079][00194] Avg episode reward: [(0, '32.767')] [2024-09-01 18:03:04,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9547776. Throughput: 0: 233.2. Samples: 385484. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 18:03:04,076][00194] Avg episode reward: [(0, '32.921')] [2024-09-01 18:03:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9551872. Throughput: 0: 228.9. Samples: 386686. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 18:03:09,078][00194] Avg episode reward: [(0, '32.756')] [2024-09-01 18:03:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9555968. Throughput: 0: 222.8. Samples: 387362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 18:03:14,077][00194] Avg episode reward: [(0, '32.784')] [2024-09-01 18:03:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9560064. Throughput: 0: 230.0. Samples: 388718. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 18:03:19,075][00194] Avg episode reward: [(0, '32.518')] [2024-09-01 18:03:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9564160. Throughput: 0: 249.3. Samples: 390728. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:03:24,077][00194] Avg episode reward: [(0, '32.294')] [2024-09-01 18:03:25,777][47741] Updated weights for policy 0, policy_version 2336 (0.0519) [2024-09-01 18:03:29,075][00194] Fps is (10 sec: 819.0, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 9568256. Throughput: 0: 233.9. Samples: 390952. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:03:29,081][00194] Avg episode reward: [(0, '32.163')] [2024-09-01 18:03:34,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9572352. Throughput: 0: 220.5. Samples: 392246. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:03:34,091][00194] Avg episode reward: [(0, '32.098')] [2024-09-01 18:03:39,073][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9580544. Throughput: 0: 232.5. Samples: 393764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:03:39,076][00194] Avg episode reward: [(0, '32.891')] [2024-09-01 18:03:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9584640. Throughput: 0: 246.9. Samples: 394782. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:03:44,076][00194] Avg episode reward: [(0, '33.006')] [2024-09-01 18:03:49,073][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9584640. Throughput: 0: 225.4. Samples: 395626. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:03:49,080][00194] Avg episode reward: [(0, '32.666')] [2024-09-01 18:03:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9592832. Throughput: 0: 227.8. Samples: 396936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:03:54,076][00194] Avg episode reward: [(0, '32.172')] [2024-09-01 18:03:59,075][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9596928. Throughput: 0: 234.8. Samples: 397930. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:03:59,082][00194] Avg episode reward: [(0, '30.917')] [2024-09-01 18:04:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9601024. Throughput: 0: 231.9. Samples: 399152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:04,076][00194] Avg episode reward: [(0, '30.305')] [2024-09-01 18:04:09,073][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9605120. Throughput: 0: 216.1. Samples: 400454. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:09,078][00194] Avg episode reward: [(0, '30.406')] [2024-09-01 18:04:11,342][47741] Updated weights for policy 0, policy_version 2346 (0.0520) [2024-09-01 18:04:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9609216. Throughput: 0: 226.6. Samples: 401150. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:14,082][00194] Avg episode reward: [(0, '30.061')] [2024-09-01 18:04:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9613312. Throughput: 0: 236.5. Samples: 402890. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:19,081][00194] Avg episode reward: [(0, '30.212')] [2024-09-01 18:04:24,078][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 9617408. Throughput: 0: 228.1. Samples: 404028. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:24,081][00194] Avg episode reward: [(0, '28.819')] [2024-09-01 18:04:29,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 930.3). Total num frames: 9625600. Throughput: 0: 220.6. Samples: 404710. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:29,076][00194] Avg episode reward: [(0, '28.755')] [2024-09-01 18:04:34,073][00194] Fps is (10 sec: 1229.4, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9629696. Throughput: 0: 233.4. Samples: 406130. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:34,076][00194] Avg episode reward: [(0, '28.670')] [2024-09-01 18:04:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9633792. Throughput: 0: 234.4. Samples: 407486. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:39,076][00194] Avg episode reward: [(0, '28.003')] [2024-09-01 18:04:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9637888. Throughput: 0: 226.1. Samples: 408104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:44,079][00194] Avg episode reward: [(0, '27.878')] [2024-09-01 18:04:46,661][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002354_9641984.pth... [2024-09-01 18:04:46,779][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002300_9420800.pth [2024-09-01 18:04:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9641984. Throughput: 0: 231.4. Samples: 409564. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:04:49,076][00194] Avg episode reward: [(0, '27.461')] [2024-09-01 18:04:54,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9646080. Throughput: 0: 238.2. Samples: 411172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:04:54,081][00194] Avg episode reward: [(0, '27.707')] [2024-09-01 18:04:54,880][47741] Updated weights for policy 0, policy_version 2356 (0.1951) [2024-09-01 18:04:58,825][47728] Signal inference workers to stop experience collection... (400 times) [2024-09-01 18:04:58,938][47741] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-09-01 18:04:59,076][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9650176. Throughput: 0: 231.8. Samples: 411580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:04:59,087][00194] Avg episode reward: [(0, '27.191')] [2024-09-01 18:05:00,578][47728] Signal inference workers to resume experience collection... (400 times) [2024-09-01 18:05:00,578][47741] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-09-01 18:05:04,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9654272. Throughput: 0: 224.4. Samples: 412990. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:05:04,080][00194] Avg episode reward: [(0, '27.787')] [2024-09-01 18:05:09,077][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9662464. Throughput: 0: 230.0. Samples: 414378. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:09,085][00194] Avg episode reward: [(0, '27.479')] [2024-09-01 18:05:14,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9666560. Throughput: 0: 233.1. Samples: 415198. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:14,079][00194] Avg episode reward: [(0, '28.152')] [2024-09-01 18:05:19,073][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9670656. Throughput: 0: 223.9. Samples: 416204. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:19,076][00194] Avg episode reward: [(0, '28.340')] [2024-09-01 18:05:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 9674752. Throughput: 0: 232.1. Samples: 417932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:24,076][00194] Avg episode reward: [(0, '28.312')] [2024-09-01 18:05:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9678848. Throughput: 0: 234.4. Samples: 418650. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:29,078][00194] Avg episode reward: [(0, '28.669')] [2024-09-01 18:05:34,074][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9682944. Throughput: 0: 229.9. Samples: 419910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:34,080][00194] Avg episode reward: [(0, '29.460')] [2024-09-01 18:05:39,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9687040. Throughput: 0: 227.9. Samples: 421428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:39,084][00194] Avg episode reward: [(0, '28.777')] [2024-09-01 18:05:39,969][47741] Updated weights for policy 0, policy_version 2366 (0.0062) [2024-09-01 18:05:44,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9695232. Throughput: 0: 235.1. Samples: 422158. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:44,078][00194] Avg episode reward: [(0, '28.424')] [2024-09-01 18:05:49,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 9699328. Throughput: 0: 233.3. Samples: 423490. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:49,079][00194] Avg episode reward: [(0, '28.188')] [2024-09-01 18:05:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9703424. Throughput: 0: 226.4. Samples: 424564. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:54,075][00194] Avg episode reward: [(0, '27.966')] [2024-09-01 18:05:59,076][00194] Fps is (10 sec: 819.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9707520. Throughput: 0: 231.1. Samples: 425600. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:05:59,083][00194] Avg episode reward: [(0, '28.226')] [2024-09-01 18:06:04,075][00194] Fps is (10 sec: 409.5, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 9707520. Throughput: 0: 231.2. Samples: 426610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 18:06:04,094][00194] Avg episode reward: [(0, '28.003')] [2024-09-01 18:06:09,076][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9711616. Throughput: 0: 209.3. Samples: 427352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:06:09,086][00194] Avg episode reward: [(0, '28.003')] [2024-09-01 18:06:14,073][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9715712. Throughput: 0: 201.1. Samples: 427700. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:06:14,076][00194] Avg episode reward: [(0, '28.529')] [2024-09-01 18:06:19,073][00194] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9719808. Throughput: 0: 205.4. Samples: 429152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:06:19,075][00194] Avg episode reward: [(0, '28.474')] [2024-09-01 18:06:24,074][00194] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9723904. Throughput: 0: 208.0. Samples: 430786. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:06:24,085][00194] Avg episode reward: [(0, '29.285')] [2024-09-01 18:06:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 9728000. Throughput: 0: 204.7. Samples: 431370. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:06:29,076][00194] Avg episode reward: [(0, '28.261')] [2024-09-01 18:06:30,004][47741] Updated weights for policy 0, policy_version 2376 (0.1544) [2024-09-01 18:06:34,073][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9732096. Throughput: 0: 200.8. Samples: 432528. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:06:34,082][00194] Avg episode reward: [(0, '28.600')] [2024-09-01 18:06:39,073][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9740288. Throughput: 0: 209.6. Samples: 433996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:06:39,075][00194] Avg episode reward: [(0, '29.039')] [2024-09-01 18:06:44,074][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9744384. Throughput: 0: 205.1. Samples: 434828. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:06:44,078][00194] Avg episode reward: [(0, '29.022')] [2024-09-01 18:06:48,050][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002380_9748480.pth... [2024-09-01 18:06:48,214][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002328_9535488.pth [2024-09-01 18:06:49,080][00194] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 902.5). Total num frames: 9748480. Throughput: 0: 205.1. Samples: 435840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:06:49,084][00194] Avg episode reward: [(0, '28.680')] [2024-09-01 18:06:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9752576. Throughput: 0: 216.9. Samples: 437114. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:06:54,075][00194] Avg episode reward: [(0, '28.503')] [2024-09-01 18:06:59,074][00194] Fps is (10 sec: 819.7, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 9756672. Throughput: 0: 228.8. Samples: 437996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:06:59,077][00194] Avg episode reward: [(0, '28.928')] [2024-09-01 18:07:04,075][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9760768. Throughput: 0: 229.1. Samples: 439462. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:07:04,087][00194] Avg episode reward: [(0, '30.595')] [2024-09-01 18:07:09,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9764864. Throughput: 0: 216.2. Samples: 440516. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:09,085][00194] Avg episode reward: [(0, '31.551')] [2024-09-01 18:07:14,073][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9768960. Throughput: 0: 218.8. Samples: 441218. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:14,076][00194] Avg episode reward: [(0, '31.883')] [2024-09-01 18:07:14,662][47741] Updated weights for policy 0, policy_version 2386 (0.1149) [2024-09-01 18:07:19,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9777152. Throughput: 0: 233.0. Samples: 443014. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:07:19,076][00194] Avg episode reward: [(0, '31.932')] [2024-09-01 18:07:24,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9777152. Throughput: 0: 224.4. Samples: 444096. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:07:24,076][00194] Avg episode reward: [(0, '32.186')] [2024-09-01 18:07:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9785344. Throughput: 0: 218.2. Samples: 444648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:07:29,084][00194] Avg episode reward: [(0, '31.895')] [2024-09-01 18:07:34,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9789440. Throughput: 0: 229.2. Samples: 446154. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:34,082][00194] Avg episode reward: [(0, '30.308')] [2024-09-01 18:07:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9793536. Throughput: 0: 231.8. Samples: 447546. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:39,079][00194] Avg episode reward: [(0, '30.303')] [2024-09-01 18:07:44,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9797632. Throughput: 0: 226.8. Samples: 448200. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:44,077][00194] Avg episode reward: [(0, '30.197')] [2024-09-01 18:07:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 9801728. Throughput: 0: 223.7. Samples: 449526. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:49,082][00194] Avg episode reward: [(0, '30.171')] [2024-09-01 18:07:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9805824. Throughput: 0: 238.4. Samples: 451244. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:54,079][00194] Avg episode reward: [(0, '29.821')] [2024-09-01 18:07:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9809920. Throughput: 0: 237.3. Samples: 451896. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:07:59,076][00194] Avg episode reward: [(0, '29.821')] [2024-09-01 18:08:00,456][47741] Updated weights for policy 0, policy_version 2396 (0.1299) [2024-09-01 18:08:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9814016. Throughput: 0: 218.1. Samples: 452828. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:08:04,081][00194] Avg episode reward: [(0, '29.633')] [2024-09-01 18:08:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9822208. Throughput: 0: 228.3. Samples: 454370. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:08:09,081][00194] Avg episode reward: [(0, '29.469')] [2024-09-01 18:08:14,074][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9826304. Throughput: 0: 237.4. Samples: 455330. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:08:14,081][00194] Avg episode reward: [(0, '30.367')] [2024-09-01 18:08:19,076][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 9830400. Throughput: 0: 226.1. Samples: 456328. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:08:19,084][00194] Avg episode reward: [(0, '31.097')] [2024-09-01 18:08:24,073][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9834496. Throughput: 0: 228.7. Samples: 457838. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:08:24,077][00194] Avg episode reward: [(0, '31.731')] [2024-09-01 18:08:29,073][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9838592. Throughput: 0: 229.9. Samples: 458544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:08:29,077][00194] Avg episode reward: [(0, '31.720')] [2024-09-01 18:08:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9842688. Throughput: 0: 235.6. Samples: 460128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:08:34,077][00194] Avg episode reward: [(0, '31.138')] [2024-09-01 18:08:39,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9846784. Throughput: 0: 221.2. Samples: 461198. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:08:39,084][00194] Avg episode reward: [(0, '30.651')] [2024-09-01 18:08:44,079][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 9854976. Throughput: 0: 224.2. Samples: 461986. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:08:44,090][00194] Avg episode reward: [(0, '30.588')] [2024-09-01 18:08:44,579][47741] Updated weights for policy 0, policy_version 2406 (0.1492) [2024-09-01 18:08:46,895][47728] Signal inference workers to stop experience collection... (450 times) [2024-09-01 18:08:46,933][47741] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-09-01 18:08:47,909][47728] Signal inference workers to resume experience collection... (450 times) [2024-09-01 18:08:47,910][47741] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-09-01 18:08:47,912][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002407_9859072.pth... [2024-09-01 18:08:48,022][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002354_9641984.pth [2024-09-01 18:08:49,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9859072. Throughput: 0: 237.6. Samples: 463522. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:08:49,078][00194] Avg episode reward: [(0, '30.072')] [2024-09-01 18:08:54,073][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9863168. Throughput: 0: 226.3. Samples: 464552. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:08:54,081][00194] Avg episode reward: [(0, '30.718')] [2024-09-01 18:08:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9867264. Throughput: 0: 220.9. Samples: 465270. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 18:08:59,079][00194] Avg episode reward: [(0, '30.633')] [2024-09-01 18:09:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9871360. Throughput: 0: 231.8. Samples: 466760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:09:04,076][00194] Avg episode reward: [(0, '30.259')] [2024-09-01 18:09:09,077][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 9875456. Throughput: 0: 231.8. Samples: 468270. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:09:09,081][00194] Avg episode reward: [(0, '29.800')] [2024-09-01 18:09:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9879552. Throughput: 0: 223.7. Samples: 468610. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 18:09:14,088][00194] Avg episode reward: [(0, '29.922')] [2024-09-01 18:09:19,073][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9883648. Throughput: 0: 217.3. Samples: 469908. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 18:09:19,076][00194] Avg episode reward: [(0, '29.090')] [2024-09-01 18:09:24,074][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9891840. Throughput: 0: 232.0. Samples: 471638. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:09:24,084][00194] Avg episode reward: [(0, '28.728')] [2024-09-01 18:09:29,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9891840. Throughput: 0: 227.6. Samples: 472226. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:09:29,077][00194] Avg episode reward: [(0, '28.489')] [2024-09-01 18:09:29,460][47741] Updated weights for policy 0, policy_version 2416 (0.1013) [2024-09-01 18:09:34,074][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9895936. Throughput: 0: 220.9. Samples: 473464. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:09:34,082][00194] Avg episode reward: [(0, '28.908')] [2024-09-01 18:09:39,073][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9904128. Throughput: 0: 227.5. Samples: 474788. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:09:39,076][00194] Avg episode reward: [(0, '28.223')] [2024-09-01 18:09:44,073][00194] Fps is (10 sec: 1228.8, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 9908224. Throughput: 0: 232.3. Samples: 475724. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:09:44,077][00194] Avg episode reward: [(0, '29.015')] [2024-09-01 18:09:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9912320. Throughput: 0: 222.4. Samples: 476766. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:09:49,081][00194] Avg episode reward: [(0, '29.693')] [2024-09-01 18:09:54,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9916416. Throughput: 0: 217.4. Samples: 478054. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 18:09:54,076][00194] Avg episode reward: [(0, '30.028')] [2024-09-01 18:09:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9920512. Throughput: 0: 228.7. Samples: 478902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:09:59,081][00194] Avg episode reward: [(0, '30.401')] [2024-09-01 18:10:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9924608. Throughput: 0: 236.1. Samples: 480534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:10:04,076][00194] Avg episode reward: [(0, '29.947')] [2024-09-01 18:10:09,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9928704. Throughput: 0: 219.7. Samples: 481526. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:10:09,076][00194] Avg episode reward: [(0, '29.859')] [2024-09-01 18:10:14,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9932800. Throughput: 0: 220.9. Samples: 482166. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:10:14,076][00194] Avg episode reward: [(0, '29.654')] [2024-09-01 18:10:14,473][47741] Updated weights for policy 0, policy_version 2426 (0.2010) [2024-09-01 18:10:19,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9940992. Throughput: 0: 229.9. Samples: 483810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:10:19,079][00194] Avg episode reward: [(0, '29.392')] [2024-09-01 18:10:24,073][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9945088. Throughput: 0: 223.9. Samples: 484862. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:10:24,076][00194] Avg episode reward: [(0, '29.002')] [2024-09-01 18:10:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9949184. Throughput: 0: 218.8. Samples: 485570. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:10:29,079][00194] Avg episode reward: [(0, '28.779')] [2024-09-01 18:10:34,073][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 9953280. Throughput: 0: 227.9. Samples: 487022. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:10:34,081][00194] Avg episode reward: [(0, '28.756')] [2024-09-01 18:10:39,074][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9957376. Throughput: 0: 235.3. Samples: 488642. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:10:39,081][00194] Avg episode reward: [(0, '27.985')] [2024-09-01 18:10:44,078][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 9961472. Throughput: 0: 223.0. Samples: 488938. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:10:44,085][00194] Avg episode reward: [(0, '27.912')] [2024-09-01 18:10:46,673][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002433_9965568.pth... [2024-09-01 18:10:46,784][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002380_9748480.pth [2024-09-01 18:10:49,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9965568. Throughput: 0: 218.4. Samples: 490360. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 18:10:49,085][00194] Avg episode reward: [(0, '27.814')] [2024-09-01 18:10:54,073][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 9969664. Throughput: 0: 233.8. Samples: 492046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:10:54,085][00194] Avg episode reward: [(0, '28.309')] [2024-09-01 18:10:59,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9973760. Throughput: 0: 235.5. Samples: 492762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:10:59,076][00194] Avg episode reward: [(0, '28.908')] [2024-09-01 18:10:59,718][47741] Updated weights for policy 0, policy_version 2436 (0.0040) [2024-09-01 18:11:04,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 9977856. Throughput: 0: 221.2. Samples: 493766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:11:04,077][00194] Avg episode reward: [(0, '28.967')] [2024-09-01 18:11:09,073][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9986048. Throughput: 0: 226.7. Samples: 495062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:11:09,076][00194] Avg episode reward: [(0, '28.999')] [2024-09-01 18:11:14,076][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 9990144. Throughput: 0: 233.6. Samples: 496082. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 18:11:14,079][00194] Avg episode reward: [(0, '28.937')] [2024-09-01 18:11:19,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9994240. Throughput: 0: 223.0. Samples: 497058. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:11:19,081][00194] Avg episode reward: [(0, '29.295')] [2024-09-01 18:11:24,073][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 9998336. Throughput: 0: 218.9. Samples: 498492. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:11:24,082][00194] Avg episode reward: [(0, '30.067')] [2024-09-01 18:11:29,073][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 10002432. Throughput: 0: 227.8. Samples: 499188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 18:11:29,081][00194] Avg episode reward: [(0, '30.010')] [2024-09-01 18:11:30,298][47728] Stopping Batcher_0... [2024-09-01 18:11:30,300][47728] Loop batcher_evt_loop terminating... [2024-09-01 18:11:30,298][00194] Component Batcher_0 stopped! [2024-09-01 18:11:30,443][47741] Weights refcount: 2 0 [2024-09-01 18:11:30,445][47741] Stopping InferenceWorker_p0-w0... [2024-09-01 18:11:30,450][47741] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 18:11:30,446][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 18:11:30,788][00194] Component RolloutWorker_w0 stopped! [2024-09-01 18:11:30,797][47742] Stopping RolloutWorker_w0... [2024-09-01 18:11:30,798][47742] Loop rollout_proc0_evt_loop terminating... [2024-09-01 18:11:30,801][00194] Component RolloutWorker_w2 stopped! [2024-09-01 18:11:30,810][47744] Stopping RolloutWorker_w2... [2024-09-01 18:11:30,810][47744] Loop rollout_proc2_evt_loop terminating... [2024-09-01 18:11:30,824][00194] Component RolloutWorker_w4 stopped! [2024-09-01 18:11:30,836][47745] Stopping RolloutWorker_w4... [2024-09-01 18:11:30,837][47745] Loop rollout_proc4_evt_loop terminating... [2024-09-01 18:11:30,861][47747] Stopping RolloutWorker_w5... [2024-09-01 18:11:30,870][47747] Loop rollout_proc5_evt_loop terminating... [2024-09-01 18:11:30,861][00194] Component RolloutWorker_w5 stopped! [2024-09-01 18:11:30,876][00194] Component RolloutWorker_w6 stopped! [2024-09-01 18:11:30,885][47748] Stopping RolloutWorker_w6... [2024-09-01 18:11:30,886][47748] Loop rollout_proc6_evt_loop terminating... [2024-09-01 18:11:30,935][47743] Stopping RolloutWorker_w1... [2024-09-01 18:11:30,935][00194] Component RolloutWorker_w1 stopped! [2024-09-01 18:11:30,959][47743] Loop rollout_proc1_evt_loop terminating... [2024-09-01 18:11:30,988][47749] Stopping RolloutWorker_w7... [2024-09-01 18:11:30,988][00194] Component RolloutWorker_w7 stopped! [2024-09-01 18:11:30,988][47749] Loop rollout_proc7_evt_loop terminating... [2024-09-01 18:11:31,044][47746] Stopping RolloutWorker_w3... [2024-09-01 18:11:31,051][47746] Loop rollout_proc3_evt_loop terminating... [2024-09-01 18:11:31,046][00194] Component RolloutWorker_w3 stopped! [2024-09-01 18:11:36,252][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002444_10010624.pth... [2024-09-01 18:11:36,375][47728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002407_9859072.pth [2024-09-01 18:11:36,389][47728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002444_10010624.pth... [2024-09-01 18:11:36,596][47728] Stopping LearnerWorker_p0... [2024-09-01 18:11:36,598][47728] Loop learner_proc0_evt_loop terminating... [2024-09-01 18:11:36,603][00194] Component LearnerWorker_p0 stopped! [2024-09-01 18:11:36,608][00194] Waiting for process learner_proc0 to stop... [2024-09-01 18:11:37,494][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 18:11:37,502][00194] Waiting for process rollout_proc0 to join... [2024-09-01 18:11:37,589][00194] Waiting for process rollout_proc1 to join... [2024-09-01 18:11:37,782][00194] Waiting for process rollout_proc2 to join... [2024-09-01 18:11:37,788][00194] Waiting for process rollout_proc3 to join... [2024-09-01 18:11:37,799][00194] Waiting for process rollout_proc4 to join... [2024-09-01 18:11:37,807][00194] Waiting for process rollout_proc5 to join... [2024-09-01 18:11:37,815][00194] Waiting for process rollout_proc6 to join... [2024-09-01 18:11:37,820][00194] Waiting for process rollout_proc7 to join... [2024-09-01 18:11:37,826][00194] Batcher 0 profile tree view: batching: 10.3939, releasing_batches: 0.1251 [2024-09-01 18:11:37,829][00194] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0054 wait_policy_total: 30.4037 update_model: 76.2508 weight_update: 0.1034 one_step: 0.0511 handle_policy_step: 1401.8384 deserialize: 44.4015, stack: 7.9121, obs_to_device_normalize: 242.5023, forward: 1021.7831, send_messages: 32.5878 prepare_outputs: 16.4232 to_cpu: 1.7060 [2024-09-01 18:11:37,832][00194] Learner 0 profile tree view: misc: 0.0031, prepare_batch: 612.7118 train: 1557.2614 epoch_init: 0.0054, minibatch_init: 0.0073, losses_postprocess: 0.0876, kl_divergence: 0.2529, after_optimizer: 1.2758 calculate_losses: 758.4823 losses_init: 0.0021, forward_head: 677.4795, bptt_initial: 1.9713, tail: 1.6671, advantages_returns: 0.1196, losses: 0.8321 bptt: 76.1046 bptt_forward_core: 75.6967 update: 796.8090 clip: 1.8411 [2024-09-01 18:11:37,834][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5442, enqueue_policy_requests: 27.1591, env_step: 810.8432, overhead: 19.9112, complete_rollouts: 8.6195 save_policy_outputs: 21.0470 split_output_tensors: 6.7242 [2024-09-01 18:11:37,835][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3561, enqueue_policy_requests: 27.7497, env_step: 799.2099, overhead: 18.5998, complete_rollouts: 8.5441 save_policy_outputs: 19.9770 split_output_tensors: 6.8093 [2024-09-01 18:11:37,837][00194] Loop Runner_EvtLoop terminating... [2024-09-01 18:11:37,839][00194] Runner profile tree view: main_loop: 2211.5626 [2024-09-01 18:11:37,841][00194] Collected {0: 10010624}, FPS: 903.8 [2024-09-01 18:11:37,890][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 18:11:37,898][00194] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-01 18:11:37,903][00194] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-01 18:11:37,905][00194] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-01 18:11:37,909][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 18:11:37,912][00194] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-01 18:11:37,913][00194] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-01 18:11:37,919][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-01 18:11:37,921][00194] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-01 18:11:37,922][00194] Adding new argument 'hf_repository'='jarski/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-01 18:11:37,924][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-01 18:11:37,925][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-01 18:11:37,926][00194] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-01 18:11:37,928][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-01 18:11:37,930][00194] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-01 18:11:37,944][00194] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 18:11:37,955][00194] RunningMeanStd input shape: (1,) [2024-09-01 18:11:37,980][00194] ConvEncoder: input_channels=3 [2024-09-01 18:11:38,038][00194] Conv encoder output size: 512 [2024-09-01 18:11:38,041][00194] Policy head output size: 512 [2024-09-01 18:11:38,070][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002444_10010624.pth... [2024-09-01 18:11:38,731][00194] Num frames 100... [2024-09-01 18:11:38,935][00194] Num frames 200... [2024-09-01 18:11:39,168][00194] Num frames 300... [2024-09-01 18:11:39,401][00194] Num frames 400... [2024-09-01 18:11:39,613][00194] Num frames 500... [2024-09-01 18:11:39,834][00194] Num frames 600... [2024-09-01 18:11:39,946][00194] Avg episode rewards: #0: 12.250, true rewards: #0: 6.250 [2024-09-01 18:11:39,948][00194] Avg episode reward: 12.250, avg true_objective: 6.250 [2024-09-01 18:11:40,113][00194] Num frames 700... [2024-09-01 18:11:40,322][00194] Num frames 800... [2024-09-01 18:11:40,521][00194] Num frames 900... [2024-09-01 18:11:40,738][00194] Num frames 1000... [2024-09-01 18:11:40,946][00194] Num frames 1100... [2024-09-01 18:11:41,160][00194] Num frames 1200... [2024-09-01 18:11:41,357][00194] Num frames 1300... [2024-09-01 18:11:41,473][00194] Avg episode rewards: #0: 13.145, true rewards: #0: 6.645 [2024-09-01 18:11:41,476][00194] Avg episode reward: 13.145, avg true_objective: 6.645 [2024-09-01 18:11:41,625][00194] Num frames 1400... [2024-09-01 18:11:41,835][00194] Num frames 1500... [2024-09-01 18:11:42,042][00194] Num frames 1600... [2024-09-01 18:11:42,279][00194] Num frames 1700... [2024-09-01 18:11:42,473][00194] Num frames 1800... [2024-09-01 18:11:42,684][00194] Avg episode rewards: #0: 12.243, true rewards: #0: 6.243 [2024-09-01 18:11:42,686][00194] Avg episode reward: 12.243, avg true_objective: 6.243 [2024-09-01 18:11:42,743][00194] Num frames 1900... [2024-09-01 18:11:42,953][00194] Num frames 2000... [2024-09-01 18:11:43,175][00194] Num frames 2100... [2024-09-01 18:11:43,408][00194] Num frames 2200... [2024-09-01 18:11:43,817][00194] Num frames 2300... [2024-09-01 18:11:44,193][00194] Num frames 2400... [2024-09-01 18:11:44,413][00194] Num frames 2500... [2024-09-01 18:11:44,636][00194] Num frames 2600... [2024-09-01 18:11:44,857][00194] Num frames 2700... [2024-09-01 18:11:44,927][00194] Avg episode rewards: #0: 12.763, true rewards: #0: 6.762 [2024-09-01 18:11:44,930][00194] Avg episode reward: 12.763, avg true_objective: 6.762 [2024-09-01 18:11:45,137][00194] Num frames 2800... [2024-09-01 18:11:45,356][00194] Num frames 2900... [2024-09-01 18:11:45,566][00194] Num frames 3000... [2024-09-01 18:11:45,802][00194] Num frames 3100... [2024-09-01 18:11:46,017][00194] Num frames 3200... [2024-09-01 18:11:46,239][00194] Num frames 3300... [2024-09-01 18:11:46,460][00194] Num frames 3400... [2024-09-01 18:11:46,684][00194] Num frames 3500... [2024-09-01 18:11:46,918][00194] Num frames 3600... [2024-09-01 18:11:47,248][00194] Num frames 3700... [2024-09-01 18:11:47,533][00194] Num frames 3800... [2024-09-01 18:11:47,831][00194] Num frames 3900... [2024-09-01 18:11:48,120][00194] Num frames 4000... [2024-09-01 18:11:48,416][00194] Num frames 4100... [2024-09-01 18:11:48,725][00194] Num frames 4200... [2024-09-01 18:11:49,072][00194] Num frames 4300... [2024-09-01 18:11:49,281][00194] Avg episode rewards: #0: 19.292, true rewards: #0: 8.692 [2024-09-01 18:11:49,285][00194] Avg episode reward: 19.292, avg true_objective: 8.692 [2024-09-01 18:11:49,485][00194] Num frames 4400... [2024-09-01 18:11:49,830][00194] Num frames 4500... [2024-09-01 18:11:50,087][00194] Num frames 4600... [2024-09-01 18:11:50,306][00194] Num frames 4700... [2024-09-01 18:11:50,510][00194] Num frames 4800... [2024-09-01 18:11:50,725][00194] Num frames 4900... [2024-09-01 18:11:50,953][00194] Num frames 5000... [2024-09-01 18:11:51,203][00194] Num frames 5100... [2024-09-01 18:11:51,365][00194] Avg episode rewards: #0: 19.243, true rewards: #0: 8.577 [2024-09-01 18:11:51,367][00194] Avg episode reward: 19.243, avg true_objective: 8.577 [2024-09-01 18:11:51,488][00194] Num frames 5200... [2024-09-01 18:11:51,699][00194] Num frames 5300... [2024-09-01 18:11:51,911][00194] Num frames 5400... [2024-09-01 18:11:52,132][00194] Num frames 5500... [2024-09-01 18:11:52,337][00194] Num frames 5600... [2024-09-01 18:11:52,550][00194] Num frames 5700... [2024-09-01 18:11:52,780][00194] Num frames 5800... [2024-09-01 18:11:53,000][00194] Num frames 5900... [2024-09-01 18:11:53,226][00194] Num frames 6000... [2024-09-01 18:11:53,435][00194] Num frames 6100... [2024-09-01 18:11:53,656][00194] Num frames 6200... [2024-09-01 18:11:53,877][00194] Num frames 6300... [2024-09-01 18:11:54,108][00194] Num frames 6400... [2024-09-01 18:11:54,340][00194] Num frames 6500... [2024-09-01 18:11:54,555][00194] Num frames 6600... [2024-09-01 18:11:54,777][00194] Num frames 6700... [2024-09-01 18:11:55,000][00194] Num frames 6800... [2024-09-01 18:11:55,233][00194] Num frames 6900... [2024-09-01 18:11:55,448][00194] Num frames 7000... [2024-09-01 18:11:55,668][00194] Num frames 7100... [2024-09-01 18:11:55,878][00194] Num frames 7200... [2024-09-01 18:11:56,038][00194] Avg episode rewards: #0: 24.637, true rewards: #0: 10.351 [2024-09-01 18:11:56,040][00194] Avg episode reward: 24.637, avg true_objective: 10.351 [2024-09-01 18:11:56,170][00194] Num frames 7300... [2024-09-01 18:11:56,380][00194] Num frames 7400... [2024-09-01 18:11:56,602][00194] Num frames 7500... [2024-09-01 18:11:56,812][00194] Num frames 7600... [2024-09-01 18:11:57,023][00194] Num frames 7700... [2024-09-01 18:11:57,268][00194] Num frames 7800... [2024-09-01 18:11:57,487][00194] Num frames 7900... [2024-09-01 18:11:57,722][00194] Num frames 8000... [2024-09-01 18:11:57,956][00194] Num frames 8100... [2024-09-01 18:11:58,193][00194] Num frames 8200... [2024-09-01 18:11:58,425][00194] Num frames 8300... [2024-09-01 18:11:58,640][00194] Num frames 8400... [2024-09-01 18:11:58,850][00194] Num frames 8500... [2024-09-01 18:11:59,069][00194] Num frames 8600... [2024-09-01 18:11:59,285][00194] Num frames 8700... [2024-09-01 18:11:59,488][00194] Num frames 8800... [2024-09-01 18:11:59,710][00194] Avg episode rewards: #0: 26.972, true rewards: #0: 11.097 [2024-09-01 18:11:59,711][00194] Avg episode reward: 26.972, avg true_objective: 11.097 [2024-09-01 18:11:59,758][00194] Num frames 8900... [2024-09-01 18:11:59,995][00194] Num frames 9000... [2024-09-01 18:12:00,319][00194] Num frames 9100... [2024-09-01 18:12:00,590][00194] Num frames 9200... [2024-09-01 18:12:00,872][00194] Num frames 9300... [2024-09-01 18:12:01,151][00194] Num frames 9400... [2024-09-01 18:12:01,434][00194] Num frames 9500... [2024-09-01 18:12:01,727][00194] Num frames 9600... [2024-09-01 18:12:02,050][00194] Num frames 9700... [2024-09-01 18:12:02,386][00194] Num frames 9800... [2024-09-01 18:12:02,696][00194] Num frames 9900... [2024-09-01 18:12:03,013][00194] Num frames 10000... [2024-09-01 18:12:03,293][00194] Num frames 10100... [2024-09-01 18:12:03,519][00194] Num frames 10200... [2024-09-01 18:12:03,735][00194] Num frames 10300... [2024-09-01 18:12:03,973][00194] Num frames 10400... [2024-09-01 18:12:04,211][00194] Num frames 10500... [2024-09-01 18:12:04,451][00194] Num frames 10600... [2024-09-01 18:12:04,551][00194] Avg episode rewards: #0: 28.908, true rewards: #0: 11.797 [2024-09-01 18:12:04,553][00194] Avg episode reward: 28.908, avg true_objective: 11.797 [2024-09-01 18:12:04,735][00194] Num frames 10700... [2024-09-01 18:12:04,952][00194] Num frames 10800... [2024-09-01 18:12:05,173][00194] Num frames 10900... [2024-09-01 18:12:05,378][00194] Num frames 11000... [2024-09-01 18:12:05,604][00194] Num frames 11100... [2024-09-01 18:12:05,821][00194] Num frames 11200... [2024-09-01 18:12:06,025][00194] Num frames 11300... [2024-09-01 18:12:06,240][00194] Num frames 11400... [2024-09-01 18:12:06,449][00194] Num frames 11500... [2024-09-01 18:12:06,664][00194] Num frames 11600... [2024-09-01 18:12:06,878][00194] Num frames 11700... [2024-09-01 18:12:07,103][00194] Num frames 11800... [2024-09-01 18:12:07,318][00194] Num frames 11900... [2024-09-01 18:12:07,539][00194] Num frames 12000... [2024-09-01 18:12:07,751][00194] Num frames 12100... [2024-09-01 18:12:07,865][00194] Avg episode rewards: #0: 30.226, true rewards: #0: 12.126 [2024-09-01 18:12:07,867][00194] Avg episode reward: 30.226, avg true_objective: 12.126 [2024-09-01 18:13:29,016][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4!