[2024-09-01 14:50:18,637][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 14:50:18,643][00194] Rollout worker 0 uses device cpu [2024-09-01 14:50:18,645][00194] Rollout worker 1 uses device cpu [2024-09-01 14:50:18,646][00194] Rollout worker 2 uses device cpu [2024-09-01 14:50:18,648][00194] Rollout worker 3 uses device cpu [2024-09-01 14:50:18,649][00194] Rollout worker 4 uses device cpu [2024-09-01 14:50:18,651][00194] Rollout worker 5 uses device cpu [2024-09-01 14:50:18,653][00194] Rollout worker 6 uses device cpu [2024-09-01 14:50:18,654][00194] Rollout worker 7 uses device cpu [2024-09-01 14:50:18,826][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 14:50:18,874][00194] Starting all processes... [2024-09-01 14:50:18,879][00194] Starting process learner_proc0 [2024-09-01 14:50:18,932][00194] Starting all processes... [2024-09-01 14:50:18,945][00194] Starting process inference_proc0-0 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc0 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc1 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc2 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc3 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc4 [2024-09-01 14:50:18,946][00194] Starting process rollout_proc5 [2024-09-01 14:50:18,947][00194] Starting process rollout_proc6 [2024-09-01 14:50:18,947][00194] Starting process rollout_proc7 [2024-09-01 14:50:32,730][03021] Starting seed is not provided [2024-09-01 14:50:32,732][03021] Initializing actor-critic model on device cpu [2024-09-01 14:50:32,733][03021] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 14:50:32,735][03021] RunningMeanStd input shape: (1,) [2024-09-01 14:50:32,820][03021] ConvEncoder: input_channels=3 [2024-09-01 14:50:33,363][03035] Worker 0 uses CPU cores [0] [2024-09-01 14:50:33,505][03042] Worker 7 uses CPU cores [1] [2024-09-01 14:50:33,519][03038] Worker 3 uses CPU cores [1] [2024-09-01 14:50:33,572][03041] Worker 6 uses CPU cores [0] [2024-09-01 14:50:33,653][03039] Worker 4 uses CPU cores [0] [2024-09-01 14:50:33,669][03037] Worker 2 uses CPU cores [0] [2024-09-01 14:50:33,694][03021] Conv encoder output size: 512 [2024-09-01 14:50:33,696][03021] Policy head output size: 512 [2024-09-01 14:50:33,724][03021] Created Actor Critic model with architecture: [2024-09-01 14:50:33,728][03036] Worker 1 uses CPU cores [1] [2024-09-01 14:50:33,726][03021] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 14:50:33,777][03040] Worker 5 uses CPU cores [1] [2024-09-01 14:50:34,292][03021] Using optimizer [2024-09-01 14:50:34,293][03021] No checkpoints found [2024-09-01 14:50:34,294][03021] Did not load from checkpoint, starting from scratch! [2024-09-01 14:50:34,294][03021] Initialized policy 0 weights for model version 0 [2024-09-01 14:50:34,297][03021] LearnerWorker_p0 finished initialization! [2024-09-01 14:50:34,305][03034] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 14:50:34,307][03034] RunningMeanStd input shape: (1,) [2024-09-01 14:50:34,333][03034] ConvEncoder: input_channels=3 [2024-09-01 14:50:34,490][03034] Conv encoder output size: 512 [2024-09-01 14:50:34,490][03034] Policy head output size: 512 [2024-09-01 14:50:34,512][00194] Inference worker 0-0 is ready! [2024-09-01 14:50:34,514][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 14:50:34,598][03038] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,599][03040] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,601][03042] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,597][03036] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,613][03035] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,610][03039] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,625][03037] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:34,627][03041] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 14:50:35,136][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:35,670][03037] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,360][03038] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,356][03040] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,363][03042] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,364][03036] Decorrelating experience for 0 frames... [2024-09-01 14:50:36,675][03037] Decorrelating experience for 32 frames... [2024-09-01 14:50:36,762][03041] Decorrelating experience for 0 frames... [2024-09-01 14:50:37,247][03038] Decorrelating experience for 32 frames... [2024-09-01 14:50:37,249][03036] Decorrelating experience for 32 frames... [2024-09-01 14:50:37,842][03042] Decorrelating experience for 32 frames... [2024-09-01 14:50:37,948][03039] Decorrelating experience for 0 frames... [2024-09-01 14:50:38,398][03041] Decorrelating experience for 32 frames... [2024-09-01 14:50:38,595][03037] Decorrelating experience for 64 frames... [2024-09-01 14:50:38,817][00194] Heartbeat connected on Batcher_0 [2024-09-01 14:50:38,824][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 14:50:38,886][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 14:50:39,168][03036] Decorrelating experience for 64 frames... [2024-09-01 14:50:39,403][03035] Decorrelating experience for 0 frames... [2024-09-01 14:50:39,515][03042] Decorrelating experience for 64 frames... [2024-09-01 14:50:40,136][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:40,414][03039] Decorrelating experience for 32 frames... [2024-09-01 14:50:40,550][03038] Decorrelating experience for 64 frames... [2024-09-01 14:50:41,222][03037] Decorrelating experience for 96 frames... [2024-09-01 14:50:41,279][03040] Decorrelating experience for 32 frames... [2024-09-01 14:50:41,444][03041] Decorrelating experience for 64 frames... [2024-09-01 14:50:41,490][03042] Decorrelating experience for 96 frames... [2024-09-01 14:50:41,575][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 14:50:41,671][03035] Decorrelating experience for 32 frames... [2024-09-01 14:50:41,777][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 14:50:42,279][03038] Decorrelating experience for 96 frames... [2024-09-01 14:50:42,726][03039] Decorrelating experience for 64 frames... [2024-09-01 14:50:42,873][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 14:50:43,857][03036] Decorrelating experience for 96 frames... [2024-09-01 14:50:44,758][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 14:50:45,136][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 39.4. Samples: 394. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:45,144][00194] Avg episode reward: [(0, '1.813')] [2024-09-01 14:50:46,376][03041] Decorrelating experience for 96 frames... [2024-09-01 14:50:46,844][03039] Decorrelating experience for 96 frames... [2024-09-01 14:50:47,585][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 14:50:48,589][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 14:50:48,663][03040] Decorrelating experience for 64 frames... [2024-09-01 14:50:50,136][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 104.9. Samples: 1574. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 14:50:50,140][00194] Avg episode reward: [(0, '2.296')] [2024-09-01 14:50:52,661][03035] Decorrelating experience for 64 frames... [2024-09-01 14:50:53,342][03021] Signal inference workers to stop experience collection... [2024-09-01 14:50:53,395][03034] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 14:50:53,479][03040] Decorrelating experience for 96 frames... [2024-09-01 14:50:53,575][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 14:50:53,977][03035] Decorrelating experience for 96 frames... [2024-09-01 14:50:54,080][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 14:50:54,301][03021] Signal inference workers to resume experience collection... [2024-09-01 14:50:54,302][03034] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 14:50:55,136][00194] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 109.6. Samples: 2192. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 14:50:55,138][00194] Avg episode reward: [(0, '2.547')] [2024-09-01 14:51:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 8192. Throughput: 0: 141.1. Samples: 3528. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 14:51:00,142][00194] Avg episode reward: [(0, '3.212')] [2024-09-01 14:51:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 12288. Throughput: 0: 164.7. Samples: 4940. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:05,139][00194] Avg episode reward: [(0, '3.274')] [2024-09-01 14:51:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 16384. Throughput: 0: 159.8. Samples: 5592. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:10,141][00194] Avg episode reward: [(0, '3.525')] [2024-09-01 14:51:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 20480. Throughput: 0: 174.2. Samples: 6970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:15,144][00194] Avg episode reward: [(0, '3.818')] [2024-09-01 14:51:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 24576. Throughput: 0: 192.3. Samples: 8654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:51:20,140][00194] Avg episode reward: [(0, '3.829')] [2024-09-01 14:51:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 28672. Throughput: 0: 201.0. Samples: 9046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:25,145][00194] Avg episode reward: [(0, '3.873')] [2024-09-01 14:51:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 670.3, 300 sec: 670.3). Total num frames: 36864. Throughput: 0: 223.3. Samples: 10442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:30,138][00194] Avg episode reward: [(0, '3.971')] [2024-09-01 14:51:34,366][03034] Updated weights for policy 0, policy_version 10 (0.2578) [2024-09-01 14:51:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 40960. Throughput: 0: 229.3. Samples: 11892. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:35,139][00194] Avg episode reward: [(0, '4.138')] [2024-09-01 14:51:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 693.2). Total num frames: 45056. Throughput: 0: 236.0. Samples: 12814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:40,141][00194] Avg episode reward: [(0, '4.322')] [2024-09-01 14:51:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 49152. Throughput: 0: 228.5. Samples: 13812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:45,139][00194] Avg episode reward: [(0, '4.340')] [2024-09-01 14:51:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 710.0). Total num frames: 53248. Throughput: 0: 229.9. Samples: 15286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:50,145][00194] Avg episode reward: [(0, '4.382')] [2024-09-01 14:51:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 716.8). Total num frames: 57344. Throughput: 0: 235.4. Samples: 16184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:51:55,142][00194] Avg episode reward: [(0, '4.435')] [2024-09-01 14:52:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 722.8). Total num frames: 61440. Throughput: 0: 232.4. Samples: 17426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:00,143][00194] Avg episode reward: [(0, '4.432')] [2024-09-01 14:52:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 65536. Throughput: 0: 229.6. Samples: 18986. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:05,147][00194] Avg episode reward: [(0, '4.425')] [2024-09-01 14:52:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 776.1). Total num frames: 73728. Throughput: 0: 236.8. Samples: 19700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:10,138][00194] Avg episode reward: [(0, '4.491')] [2024-09-01 14:52:15,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 778.2). Total num frames: 77824. Throughput: 0: 235.0. Samples: 21018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:15,140][00194] Avg episode reward: [(0, '4.481')] [2024-09-01 14:52:19,754][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000020_81920.pth... [2024-09-01 14:52:19,758][03034] Updated weights for policy 0, policy_version 20 (0.0527) [2024-09-01 14:52:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 780.2). Total num frames: 81920. Throughput: 0: 225.6. Samples: 22042. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:20,138][00194] Avg episode reward: [(0, '4.519')] [2024-09-01 14:52:25,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 782.0). Total num frames: 86016. Throughput: 0: 227.3. Samples: 23044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:25,139][00194] Avg episode reward: [(0, '4.486')] [2024-09-01 14:52:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 783.6). Total num frames: 90112. Throughput: 0: 240.9. Samples: 24652. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:30,141][00194] Avg episode reward: [(0, '4.496')] [2024-09-01 14:52:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 785.1). Total num frames: 94208. Throughput: 0: 233.6. Samples: 25800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:35,141][00194] Avg episode reward: [(0, '4.528')] [2024-09-01 14:52:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 786.4). Total num frames: 98304. Throughput: 0: 223.5. Samples: 26242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:52:40,145][00194] Avg episode reward: [(0, '4.456')] [2024-09-01 14:52:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 106496. Throughput: 0: 238.5. Samples: 28160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:45,151][00194] Avg episode reward: [(0, '4.433')] [2024-09-01 14:52:49,237][03021] Saving new best policy, reward=4.433! [2024-09-01 14:52:50,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 110592. Throughput: 0: 228.7. Samples: 29280. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:50,145][00194] Avg episode reward: [(0, '4.521')] [2024-09-01 14:52:54,932][03021] Saving new best policy, reward=4.521! [2024-09-01 14:52:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 114688. Throughput: 0: 228.9. Samples: 30000. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:52:55,143][00194] Avg episode reward: [(0, '4.529')] [2024-09-01 14:52:58,673][03021] Saving new best policy, reward=4.529! [2024-09-01 14:53:00,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 118784. Throughput: 0: 227.0. Samples: 31234. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:00,138][00194] Avg episode reward: [(0, '4.515')] [2024-09-01 14:53:02,571][03034] Updated weights for policy 0, policy_version 30 (0.0582) [2024-09-01 14:53:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 122880. Throughput: 0: 246.7. Samples: 33144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:05,141][00194] Avg episode reward: [(0, '4.548')] [2024-09-01 14:53:10,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 126976. Throughput: 0: 231.1. Samples: 33446. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:10,146][00194] Avg episode reward: [(0, '4.506')] [2024-09-01 14:53:12,627][03021] Saving new best policy, reward=4.548! [2024-09-01 14:53:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 131072. Throughput: 0: 222.7. Samples: 34672. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:15,144][00194] Avg episode reward: [(0, '4.561')] [2024-09-01 14:53:20,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 135168. Throughput: 0: 236.1. Samples: 36424. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:53:20,143][00194] Avg episode reward: [(0, '4.552')] [2024-09-01 14:53:20,380][03021] Saving new best policy, reward=4.561! [2024-09-01 14:53:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 843.3). Total num frames: 143360. Throughput: 0: 247.9. Samples: 37396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:25,139][00194] Avg episode reward: [(0, '4.575')] [2024-09-01 14:53:30,140][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 143360. Throughput: 0: 212.9. Samples: 37740. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:30,158][00194] Avg episode reward: [(0, '4.604')] [2024-09-01 14:53:33,783][03021] Saving new best policy, reward=4.575! [2024-09-01 14:53:33,922][03021] Saving new best policy, reward=4.604! [2024-09-01 14:53:35,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 147456. Throughput: 0: 210.5. Samples: 38752. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:35,142][00194] Avg episode reward: [(0, '4.503')] [2024-09-01 14:53:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 151552. Throughput: 0: 211.1. Samples: 39500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:40,139][00194] Avg episode reward: [(0, '4.544')] [2024-09-01 14:53:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 155648. Throughput: 0: 215.8. Samples: 40944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:53:45,146][00194] Avg episode reward: [(0, '4.456')] [2024-09-01 14:53:50,142][00194] Fps is (10 sec: 818.7, 60 sec: 819.1, 300 sec: 819.2). Total num frames: 159744. Throughput: 0: 196.4. Samples: 41982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:53:50,146][00194] Avg episode reward: [(0, '4.465')] [2024-09-01 14:53:52,315][03034] Updated weights for policy 0, policy_version 40 (0.1638) [2024-09-01 14:53:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 163840. Throughput: 0: 204.9. Samples: 42668. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:53:55,143][00194] Avg episode reward: [(0, '4.601')] [2024-09-01 14:54:00,136][00194] Fps is (10 sec: 1229.6, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 172032. Throughput: 0: 220.0. Samples: 44574. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:54:00,142][00194] Avg episode reward: [(0, '4.532')] [2024-09-01 14:54:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 172032. Throughput: 0: 203.4. Samples: 45576. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 14:54:05,138][00194] Avg episode reward: [(0, '4.487')] [2024-09-01 14:54:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.3). Total num frames: 180224. Throughput: 0: 194.0. Samples: 46128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:10,146][00194] Avg episode reward: [(0, '4.467')] [2024-09-01 14:54:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 837.8). Total num frames: 184320. Throughput: 0: 219.1. Samples: 47598. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:15,143][00194] Avg episode reward: [(0, '4.396')] [2024-09-01 14:54:17,607][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000046_188416.pth... [2024-09-01 14:54:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 837.4). Total num frames: 188416. Throughput: 0: 231.2. Samples: 49154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:20,143][00194] Avg episode reward: [(0, '4.467')] [2024-09-01 14:54:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 837.0). Total num frames: 192512. Throughput: 0: 223.1. Samples: 49540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:25,144][00194] Avg episode reward: [(0, '4.449')] [2024-09-01 14:54:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 836.6). Total num frames: 196608. Throughput: 0: 216.7. Samples: 50696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:54:30,139][00194] Avg episode reward: [(0, '4.457')] [2024-09-01 14:54:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 836.3). Total num frames: 200704. Throughput: 0: 236.7. Samples: 52630. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:35,139][00194] Avg episode reward: [(0, '4.421')] [2024-09-01 14:54:36,014][03034] Updated weights for policy 0, policy_version 50 (0.1682) [2024-09-01 14:54:40,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 835.9). Total num frames: 204800. Throughput: 0: 229.6. Samples: 53002. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:40,148][00194] Avg episode reward: [(0, '4.315')] [2024-09-01 14:54:40,302][03021] Signal inference workers to stop experience collection... (50 times) [2024-09-01 14:54:40,427][03034] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 14:54:41,141][03021] Signal inference workers to resume experience collection... (50 times) [2024-09-01 14:54:41,142][03034] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 14:54:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 835.6). Total num frames: 208896. Throughput: 0: 215.8. Samples: 54286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:45,141][00194] Avg episode reward: [(0, '4.328')] [2024-09-01 14:54:50,136][00194] Fps is (10 sec: 1229.6, 60 sec: 955.8, 300 sec: 851.3). Total num frames: 217088. Throughput: 0: 225.6. Samples: 55728. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:50,140][00194] Avg episode reward: [(0, '4.300')] [2024-09-01 14:54:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 850.7). Total num frames: 221184. Throughput: 0: 235.6. Samples: 56730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:54:55,144][00194] Avg episode reward: [(0, '4.263')] [2024-09-01 14:55:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.1). Total num frames: 225280. Throughput: 0: 225.9. Samples: 57762. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:00,140][00194] Avg episode reward: [(0, '4.302')] [2024-09-01 14:55:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 849.5). Total num frames: 229376. Throughput: 0: 222.1. Samples: 59150. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:05,138][00194] Avg episode reward: [(0, '4.240')] [2024-09-01 14:55:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 849.0). Total num frames: 233472. Throughput: 0: 229.4. Samples: 59864. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:10,142][00194] Avg episode reward: [(0, '4.327')] [2024-09-01 14:55:15,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 848.5). Total num frames: 237568. Throughput: 0: 239.0. Samples: 61452. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:15,140][00194] Avg episode reward: [(0, '4.340')] [2024-09-01 14:55:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 847.9). Total num frames: 241664. Throughput: 0: 220.1. Samples: 62536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:20,139][00194] Avg episode reward: [(0, '4.356')] [2024-09-01 14:55:21,561][03034] Updated weights for policy 0, policy_version 60 (0.0050) [2024-09-01 14:55:25,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 847.4). Total num frames: 245760. Throughput: 0: 229.6. Samples: 63334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:25,145][00194] Avg episode reward: [(0, '4.346')] [2024-09-01 14:55:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 860.9). Total num frames: 253952. Throughput: 0: 236.7. Samples: 64938. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:30,139][00194] Avg episode reward: [(0, '4.353')] [2024-09-01 14:55:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 874.7). Total num frames: 258048. Throughput: 0: 228.0. Samples: 65990. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:35,139][00194] Avg episode reward: [(0, '4.448')] [2024-09-01 14:55:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 888.6). Total num frames: 262144. Throughput: 0: 219.6. Samples: 66614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:40,140][00194] Avg episode reward: [(0, '4.677')] [2024-09-01 14:55:43,065][03021] Saving new best policy, reward=4.677! [2024-09-01 14:55:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 266240. Throughput: 0: 233.9. Samples: 68286. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:45,145][00194] Avg episode reward: [(0, '4.605')] [2024-09-01 14:55:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 270336. Throughput: 0: 230.7. Samples: 69530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:55:50,140][00194] Avg episode reward: [(0, '4.674')] [2024-09-01 14:55:55,141][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 274432. Throughput: 0: 227.6. Samples: 70106. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:55:55,149][00194] Avg episode reward: [(0, '4.661')] [2024-09-01 14:56:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 278528. Throughput: 0: 226.2. Samples: 71630. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:00,144][00194] Avg episode reward: [(0, '4.700')] [2024-09-01 14:56:04,897][03021] Saving new best policy, reward=4.700! [2024-09-01 14:56:04,901][03034] Updated weights for policy 0, policy_version 70 (0.1759) [2024-09-01 14:56:05,136][00194] Fps is (10 sec: 1229.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 286720. Throughput: 0: 237.5. Samples: 73224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:05,139][00194] Avg episode reward: [(0, '4.741')] [2024-09-01 14:56:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 286720. Throughput: 0: 236.0. Samples: 73956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:10,139][00194] Avg episode reward: [(0, '4.723')] [2024-09-01 14:56:10,600][03021] Saving new best policy, reward=4.741! [2024-09-01 14:56:15,137][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 290816. Throughput: 0: 224.3. Samples: 75032. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:15,147][00194] Avg episode reward: [(0, '4.737')] [2024-09-01 14:56:19,195][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-09-01 14:56:19,305][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000020_81920.pth [2024-09-01 14:56:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 299008. Throughput: 0: 232.8. Samples: 76464. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:20,143][00194] Avg episode reward: [(0, '4.726')] [2024-09-01 14:56:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 303104. Throughput: 0: 237.9. Samples: 77320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:25,139][00194] Avg episode reward: [(0, '4.575')] [2024-09-01 14:56:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 307200. Throughput: 0: 221.5. Samples: 78252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:30,138][00194] Avg episode reward: [(0, '4.571')] [2024-09-01 14:56:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 311296. Throughput: 0: 230.0. Samples: 79880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:56:35,145][00194] Avg episode reward: [(0, '4.496')] [2024-09-01 14:56:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 315392. Throughput: 0: 229.3. Samples: 80424. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 14:56:40,143][00194] Avg episode reward: [(0, '4.360')] [2024-09-01 14:56:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 319488. Throughput: 0: 231.6. Samples: 82050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 14:56:45,139][00194] Avg episode reward: [(0, '4.360')] [2024-09-01 14:56:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 323584. Throughput: 0: 222.3. Samples: 83226. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:56:50,144][00194] Avg episode reward: [(0, '4.339')] [2024-09-01 14:56:51,612][03034] Updated weights for policy 0, policy_version 80 (0.1042) [2024-09-01 14:56:55,148][00194] Fps is (10 sec: 1227.4, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 331776. Throughput: 0: 218.5. Samples: 83792. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:56:55,153][00194] Avg episode reward: [(0, '4.330')] [2024-09-01 14:57:00,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 335872. Throughput: 0: 232.3. Samples: 85486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:00,146][00194] Avg episode reward: [(0, '4.269')] [2024-09-01 14:57:05,136][00194] Fps is (10 sec: 820.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 339968. Throughput: 0: 222.7. Samples: 86486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:05,143][00194] Avg episode reward: [(0, '4.248')] [2024-09-01 14:57:10,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 344064. Throughput: 0: 219.2. Samples: 87186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:10,142][00194] Avg episode reward: [(0, '4.262')] [2024-09-01 14:57:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 348160. Throughput: 0: 232.4. Samples: 88712. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:15,138][00194] Avg episode reward: [(0, '4.427')] [2024-09-01 14:57:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 352256. Throughput: 0: 228.6. Samples: 90166. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:20,140][00194] Avg episode reward: [(0, '4.391')] [2024-09-01 14:57:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 356352. Throughput: 0: 225.1. Samples: 90554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:25,139][00194] Avg episode reward: [(0, '4.470')] [2024-09-01 14:57:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 360448. Throughput: 0: 225.0. Samples: 92176. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:30,144][00194] Avg episode reward: [(0, '4.480')] [2024-09-01 14:57:34,731][03034] Updated weights for policy 0, policy_version 90 (0.1892) [2024-09-01 14:57:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 368640. Throughput: 0: 232.0. Samples: 93668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:35,139][00194] Avg episode reward: [(0, '4.562')] [2024-09-01 14:57:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 368640. Throughput: 0: 234.6. Samples: 94344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:57:40,140][00194] Avg episode reward: [(0, '4.613')] [2024-09-01 14:57:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 376832. Throughput: 0: 225.3. Samples: 95622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:45,143][00194] Avg episode reward: [(0, '4.710')] [2024-09-01 14:57:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 380928. Throughput: 0: 234.0. Samples: 97016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:50,139][00194] Avg episode reward: [(0, '4.616')] [2024-09-01 14:57:55,140][00194] Fps is (10 sec: 818.9, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 385024. Throughput: 0: 234.4. Samples: 97734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:57:55,144][00194] Avg episode reward: [(0, '4.667')] [2024-09-01 14:58:00,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 389120. Throughput: 0: 224.6. Samples: 98818. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:00,141][00194] Avg episode reward: [(0, '4.595')] [2024-09-01 14:58:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 393216. Throughput: 0: 231.3. Samples: 100574. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:05,142][00194] Avg episode reward: [(0, '4.618')] [2024-09-01 14:58:10,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 397312. Throughput: 0: 234.4. Samples: 101100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:10,145][00194] Avg episode reward: [(0, '4.595')] [2024-09-01 14:58:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 401408. Throughput: 0: 234.8. Samples: 102744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:15,141][00194] Avg episode reward: [(0, '4.580')] [2024-09-01 14:58:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 405504. Throughput: 0: 226.3. Samples: 103850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:58:20,144][00194] Avg episode reward: [(0, '4.550')] [2024-09-01 14:58:20,574][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth... [2024-09-01 14:58:20,580][03034] Updated weights for policy 0, policy_version 100 (0.1151) [2024-09-01 14:58:20,680][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000046_188416.pth [2024-09-01 14:58:22,878][03021] Signal inference workers to stop experience collection... (100 times) [2024-09-01 14:58:22,935][03034] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 14:58:24,391][03021] Signal inference workers to resume experience collection... (100 times) [2024-09-01 14:58:24,392][03034] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 14:58:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 413696. Throughput: 0: 230.1. Samples: 104698. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:25,141][00194] Avg episode reward: [(0, '4.563')] [2024-09-01 14:58:30,141][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 417792. Throughput: 0: 230.9. Samples: 106016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:30,152][00194] Avg episode reward: [(0, '4.566')] [2024-09-01 14:58:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 421888. Throughput: 0: 225.6. Samples: 107168. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:35,139][00194] Avg episode reward: [(0, '4.687')] [2024-09-01 14:58:40,136][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 425984. Throughput: 0: 228.2. Samples: 108000. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 14:58:40,143][00194] Avg episode reward: [(0, '4.638')] [2024-09-01 14:58:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 430080. Throughput: 0: 237.9. Samples: 109524. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:45,139][00194] Avg episode reward: [(0, '4.693')] [2024-09-01 14:58:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 434176. Throughput: 0: 230.9. Samples: 110966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:58:50,141][00194] Avg episode reward: [(0, '4.693')] [2024-09-01 14:58:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 438272. Throughput: 0: 227.2. Samples: 111322. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:58:55,143][00194] Avg episode reward: [(0, '4.664')] [2024-09-01 14:59:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 442368. Throughput: 0: 222.4. Samples: 112750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:00,146][00194] Avg episode reward: [(0, '4.659')] [2024-09-01 14:59:04,236][03034] Updated weights for policy 0, policy_version 110 (0.1529) [2024-09-01 14:59:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 450560. Throughput: 0: 232.6. Samples: 114318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:05,140][00194] Avg episode reward: [(0, '4.645')] [2024-09-01 14:59:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 454656. Throughput: 0: 231.7. Samples: 115126. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:10,140][00194] Avg episode reward: [(0, '4.667')] [2024-09-01 14:59:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 458752. Throughput: 0: 227.5. Samples: 116250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:15,140][00194] Avg episode reward: [(0, '4.686')] [2024-09-01 14:59:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 462848. Throughput: 0: 235.5. Samples: 117764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:20,139][00194] Avg episode reward: [(0, '4.675')] [2024-09-01 14:59:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 466944. Throughput: 0: 229.6. Samples: 118330. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:25,146][00194] Avg episode reward: [(0, '4.773')] [2024-09-01 14:59:28,465][03021] Saving new best policy, reward=4.773! [2024-09-01 14:59:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 471040. Throughput: 0: 218.7. Samples: 119364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:30,142][00194] Avg episode reward: [(0, '4.739')] [2024-09-01 14:59:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 475136. Throughput: 0: 220.4. Samples: 120884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:35,143][00194] Avg episode reward: [(0, '4.641')] [2024-09-01 14:59:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 479232. Throughput: 0: 228.1. Samples: 121586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:40,141][00194] Avg episode reward: [(0, '4.693')] [2024-09-01 14:59:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 483328. Throughput: 0: 229.9. Samples: 123094. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:45,139][00194] Avg episode reward: [(0, '4.684')] [2024-09-01 14:59:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 487424. Throughput: 0: 219.0. Samples: 124172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 14:59:50,143][00194] Avg episode reward: [(0, '4.749')] [2024-09-01 14:59:51,140][03034] Updated weights for policy 0, policy_version 120 (0.1018) [2024-09-01 14:59:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 495616. Throughput: 0: 218.4. Samples: 124956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 14:59:55,138][00194] Avg episode reward: [(0, '4.674')] [2024-09-01 15:00:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 499712. Throughput: 0: 224.7. Samples: 126362. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:00:00,142][00194] Avg episode reward: [(0, '4.494')] [2024-09-01 15:00:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 503808. Throughput: 0: 216.9. Samples: 127524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:05,140][00194] Avg episode reward: [(0, '4.494')] [2024-09-01 15:00:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 507904. Throughput: 0: 221.1. Samples: 128280. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:10,139][00194] Avg episode reward: [(0, '4.457')] [2024-09-01 15:00:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 512000. Throughput: 0: 231.3. Samples: 129772. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:15,138][00194] Avg episode reward: [(0, '4.464')] [2024-09-01 15:00:20,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 516096. Throughput: 0: 232.2. Samples: 131334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:20,145][00194] Avg episode reward: [(0, '4.520')] [2024-09-01 15:00:21,930][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000127_520192.pth... [2024-09-01 15:00:22,073][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-09-01 15:00:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 520192. Throughput: 0: 223.6. Samples: 131648. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:25,139][00194] Avg episode reward: [(0, '4.569')] [2024-09-01 15:00:30,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 524288. Throughput: 0: 225.9. Samples: 133258. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:30,141][00194] Avg episode reward: [(0, '4.595')] [2024-09-01 15:00:34,393][03034] Updated weights for policy 0, policy_version 130 (0.0539) [2024-09-01 15:00:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 532480. Throughput: 0: 232.9. Samples: 134654. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:35,138][00194] Avg episode reward: [(0, '4.611')] [2024-09-01 15:00:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 536576. Throughput: 0: 233.0. Samples: 135442. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:40,140][00194] Avg episode reward: [(0, '4.575')] [2024-09-01 15:00:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 540672. Throughput: 0: 227.2. Samples: 136588. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:45,144][00194] Avg episode reward: [(0, '4.644')] [2024-09-01 15:00:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 544768. Throughput: 0: 234.0. Samples: 138056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:50,138][00194] Avg episode reward: [(0, '4.662')] [2024-09-01 15:00:55,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 548864. Throughput: 0: 233.1. Samples: 138768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:00:55,144][00194] Avg episode reward: [(0, '4.767')] [2024-09-01 15:01:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 552960. Throughput: 0: 222.2. Samples: 139772. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:00,142][00194] Avg episode reward: [(0, '4.729')] [2024-09-01 15:01:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 557056. Throughput: 0: 227.7. Samples: 141580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:05,139][00194] Avg episode reward: [(0, '4.637')] [2024-09-01 15:01:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 565248. Throughput: 0: 236.0. Samples: 142268. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:01:10,138][00194] Avg episode reward: [(0, '4.574')] [2024-09-01 15:01:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 565248. Throughput: 0: 233.5. Samples: 143766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:01:15,139][00194] Avg episode reward: [(0, '4.515')] [2024-09-01 15:01:20,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 569344. Throughput: 0: 225.2. Samples: 144786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:20,144][00194] Avg episode reward: [(0, '4.487')] [2024-09-01 15:01:20,351][03034] Updated weights for policy 0, policy_version 140 (0.1177) [2024-09-01 15:01:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 577536. Throughput: 0: 229.0. Samples: 145746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:25,144][00194] Avg episode reward: [(0, '4.569')] [2024-09-01 15:01:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 581632. Throughput: 0: 235.0. Samples: 147164. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:30,143][00194] Avg episode reward: [(0, '4.565')] [2024-09-01 15:01:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 585728. Throughput: 0: 226.4. Samples: 148242. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:35,147][00194] Avg episode reward: [(0, '4.511')] [2024-09-01 15:01:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 589824. Throughput: 0: 225.3. Samples: 148904. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:40,141][00194] Avg episode reward: [(0, '4.510')] [2024-09-01 15:01:45,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 593920. Throughput: 0: 238.1. Samples: 150488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:45,142][00194] Avg episode reward: [(0, '4.465')] [2024-09-01 15:01:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 598016. Throughput: 0: 230.5. Samples: 151954. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:50,142][00194] Avg episode reward: [(0, '4.498')] [2024-09-01 15:01:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 602112. Throughput: 0: 223.6. Samples: 152328. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:01:55,139][00194] Avg episode reward: [(0, '4.447')] [2024-09-01 15:02:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 610304. Throughput: 0: 227.8. Samples: 154016. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:00,139][00194] Avg episode reward: [(0, '4.431')] [2024-09-01 15:02:03,616][03034] Updated weights for policy 0, policy_version 150 (0.1017) [2024-09-01 15:02:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 614400. Throughput: 0: 238.2. Samples: 155506. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:05,138][00194] Avg episode reward: [(0, '4.470')] [2024-09-01 15:02:07,017][03021] Signal inference workers to stop experience collection... (150 times) [2024-09-01 15:02:07,091][03034] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 15:02:08,870][03021] Signal inference workers to resume experience collection... (150 times) [2024-09-01 15:02:08,878][03034] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 15:02:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 618496. Throughput: 0: 229.7. Samples: 156084. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:10,143][00194] Avg episode reward: [(0, '4.466')] [2024-09-01 15:02:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 622592. Throughput: 0: 220.9. Samples: 157106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:15,138][00194] Avg episode reward: [(0, '4.473')] [2024-09-01 15:02:17,310][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth... [2024-09-01 15:02:17,393][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth [2024-09-01 15:02:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 626688. Throughput: 0: 238.5. Samples: 158976. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:02:20,139][00194] Avg episode reward: [(0, '4.478')] [2024-09-01 15:02:25,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 630784. Throughput: 0: 235.4. Samples: 159500. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:02:25,143][00194] Avg episode reward: [(0, '4.544')] [2024-09-01 15:02:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 634880. Throughput: 0: 223.4. Samples: 160542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:30,144][00194] Avg episode reward: [(0, '4.618')] [2024-09-01 15:02:35,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 638976. Throughput: 0: 229.2. Samples: 162266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:35,138][00194] Avg episode reward: [(0, '4.712')] [2024-09-01 15:02:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 647168. Throughput: 0: 239.9. Samples: 163124. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:02:40,143][00194] Avg episode reward: [(0, '4.754')] [2024-09-01 15:02:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 651264. Throughput: 0: 227.9. Samples: 164272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:02:45,140][00194] Avg episode reward: [(0, '4.803')] [2024-09-01 15:02:49,529][03021] Saving new best policy, reward=4.803! [2024-09-01 15:02:49,534][03034] Updated weights for policy 0, policy_version 160 (0.1163) [2024-09-01 15:02:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 655360. Throughput: 0: 217.7. Samples: 165302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:02:50,138][00194] Avg episode reward: [(0, '4.753')] [2024-09-01 15:02:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 659456. Throughput: 0: 227.1. Samples: 166302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:02:55,142][00194] Avg episode reward: [(0, '4.763')] [2024-09-01 15:03:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 663552. Throughput: 0: 235.3. Samples: 167694. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:00,145][00194] Avg episode reward: [(0, '4.789')] [2024-09-01 15:03:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 667648. Throughput: 0: 218.0. Samples: 168786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:05,142][00194] Avg episode reward: [(0, '4.809')] [2024-09-01 15:03:07,580][03021] Saving new best policy, reward=4.809! [2024-09-01 15:03:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 671744. Throughput: 0: 219.8. Samples: 169390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:10,146][00194] Avg episode reward: [(0, '4.767')] [2024-09-01 15:03:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 675840. Throughput: 0: 239.8. Samples: 171332. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:15,140][00194] Avg episode reward: [(0, '4.704')] [2024-09-01 15:03:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 679936. Throughput: 0: 227.6. Samples: 172508. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:20,141][00194] Avg episode reward: [(0, '4.710')] [2024-09-01 15:03:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 684032. Throughput: 0: 217.0. Samples: 172890. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:25,145][00194] Avg episode reward: [(0, '4.687')] [2024-09-01 15:03:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 692224. Throughput: 0: 228.2. Samples: 174542. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:03:30,144][00194] Avg episode reward: [(0, '4.635')] [2024-09-01 15:03:33,067][03034] Updated weights for policy 0, policy_version 170 (0.0047) [2024-09-01 15:03:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 696320. Throughput: 0: 237.3. Samples: 175980. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:03:35,138][00194] Avg episode reward: [(0, '4.679')] [2024-09-01 15:03:40,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 696320. Throughput: 0: 224.8. Samples: 176420. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:03:40,147][00194] Avg episode reward: [(0, '4.692')] [2024-09-01 15:03:45,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 700416. Throughput: 0: 204.2. Samples: 176884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:03:45,140][00194] Avg episode reward: [(0, '4.706')] [2024-09-01 15:03:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 704512. Throughput: 0: 215.0. Samples: 178462. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:03:50,138][00194] Avg episode reward: [(0, '4.624')] [2024-09-01 15:03:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 712704. Throughput: 0: 215.8. Samples: 179100. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:03:55,145][00194] Avg episode reward: [(0, '4.634')] [2024-09-01 15:04:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 712704. Throughput: 0: 204.0. Samples: 180512. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:00,139][00194] Avg episode reward: [(0, '4.586')] [2024-09-01 15:04:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 720896. Throughput: 0: 202.8. Samples: 181632. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:05,141][00194] Avg episode reward: [(0, '4.599')] [2024-09-01 15:04:10,137][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 724992. Throughput: 0: 218.0. Samples: 182700. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:10,142][00194] Avg episode reward: [(0, '4.503')] [2024-09-01 15:04:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 729088. Throughput: 0: 209.5. Samples: 183970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:15,142][00194] Avg episode reward: [(0, '4.625')] [2024-09-01 15:04:18,125][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth... [2024-09-01 15:04:18,227][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000127_520192.pth [2024-09-01 15:04:20,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 733184. Throughput: 0: 201.3. Samples: 185038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:20,139][00194] Avg episode reward: [(0, '4.690')] [2024-09-01 15:04:22,805][03034] Updated weights for policy 0, policy_version 180 (0.1020) [2024-09-01 15:04:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 737280. Throughput: 0: 206.0. Samples: 185692. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:25,139][00194] Avg episode reward: [(0, '4.697')] [2024-09-01 15:04:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 741376. Throughput: 0: 233.2. Samples: 187380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:30,139][00194] Avg episode reward: [(0, '4.765')] [2024-09-01 15:04:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 745472. Throughput: 0: 228.0. Samples: 188724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:04:35,138][00194] Avg episode reward: [(0, '4.752')] [2024-09-01 15:04:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 749568. Throughput: 0: 221.6. Samples: 189072. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:40,138][00194] Avg episode reward: [(0, '4.782')] [2024-09-01 15:04:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 757760. Throughput: 0: 228.0. Samples: 190772. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:45,138][00194] Avg episode reward: [(0, '4.886')] [2024-09-01 15:04:48,333][03021] Saving new best policy, reward=4.886! [2024-09-01 15:04:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 761856. Throughput: 0: 234.7. Samples: 192194. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:50,142][00194] Avg episode reward: [(0, '4.834')] [2024-09-01 15:04:55,139][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 765952. Throughput: 0: 226.4. Samples: 192890. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:04:55,144][00194] Avg episode reward: [(0, '4.818')] [2024-09-01 15:05:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 770048. Throughput: 0: 222.3. Samples: 193974. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:05:00,139][00194] Avg episode reward: [(0, '4.824')] [2024-09-01 15:05:05,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 774144. Throughput: 0: 239.6. Samples: 195820. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:05,143][00194] Avg episode reward: [(0, '4.840')] [2024-09-01 15:05:06,480][03034] Updated weights for policy 0, policy_version 190 (0.1028) [2024-09-01 15:05:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 778240. Throughput: 0: 234.8. Samples: 196256. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:10,146][00194] Avg episode reward: [(0, '4.802')] [2024-09-01 15:05:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 782336. Throughput: 0: 222.0. Samples: 197368. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:15,138][00194] Avg episode reward: [(0, '4.795')] [2024-09-01 15:05:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 786432. Throughput: 0: 231.2. Samples: 199130. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:20,138][00194] Avg episode reward: [(0, '4.893')] [2024-09-01 15:05:24,273][03021] Saving new best policy, reward=4.893! [2024-09-01 15:05:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 794624. Throughput: 0: 242.6. Samples: 199990. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:25,138][00194] Avg episode reward: [(0, '4.792')] [2024-09-01 15:05:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 798720. Throughput: 0: 230.7. Samples: 201154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:05:30,145][00194] Avg episode reward: [(0, '4.761')] [2024-09-01 15:05:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 802816. Throughput: 0: 222.8. Samples: 202222. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:05:35,139][00194] Avg episode reward: [(0, '4.781')] [2024-09-01 15:05:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 806912. Throughput: 0: 229.4. Samples: 203212. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:40,138][00194] Avg episode reward: [(0, '4.901')] [2024-09-01 15:05:42,114][03021] Saving new best policy, reward=4.901! [2024-09-01 15:05:45,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 811008. Throughput: 0: 236.3. Samples: 204608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:45,140][00194] Avg episode reward: [(0, '4.914')] [2024-09-01 15:05:48,029][03021] Saving new best policy, reward=4.914! [2024-09-01 15:05:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 815104. Throughput: 0: 218.4. Samples: 205650. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:50,139][00194] Avg episode reward: [(0, '4.971')] [2024-09-01 15:05:52,732][03021] Saving new best policy, reward=4.971! [2024-09-01 15:05:52,737][03034] Updated weights for policy 0, policy_version 200 (0.0549) [2024-09-01 15:05:54,974][03021] Signal inference workers to stop experience collection... (200 times) [2024-09-01 15:05:55,011][03034] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 15:05:55,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 819200. Throughput: 0: 223.1. Samples: 206294. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:05:55,138][00194] Avg episode reward: [(0, '4.972')] [2024-09-01 15:05:56,461][03021] Signal inference workers to resume experience collection... (200 times) [2024-09-01 15:05:56,462][03034] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 15:06:00,144][00194] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 823296. Throughput: 0: 240.2. Samples: 208180. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:00,150][00194] Avg episode reward: [(0, '4.914')] [2024-09-01 15:06:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 827392. Throughput: 0: 227.9. Samples: 209384. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:05,139][00194] Avg episode reward: [(0, '4.894')] [2024-09-01 15:06:10,136][00194] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 831488. Throughput: 0: 217.1. Samples: 209760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:10,144][00194] Avg episode reward: [(0, '4.865')] [2024-09-01 15:06:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 839680. Throughput: 0: 227.7. Samples: 211400. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:15,138][00194] Avg episode reward: [(0, '4.890')] [2024-09-01 15:06:18,004][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000206_843776.pth... [2024-09-01 15:06:18,115][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth [2024-09-01 15:06:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 843776. Throughput: 0: 237.2. Samples: 212898. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:20,141][00194] Avg episode reward: [(0, '4.935')] [2024-09-01 15:06:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 847872. Throughput: 0: 226.6. Samples: 213408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:25,141][00194] Avg episode reward: [(0, '4.932')] [2024-09-01 15:06:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 851968. Throughput: 0: 220.2. Samples: 214516. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:30,139][00194] Avg episode reward: [(0, '5.019')] [2024-09-01 15:06:32,231][03021] Saving new best policy, reward=5.019! [2024-09-01 15:06:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 856064. Throughput: 0: 241.8. Samples: 216530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:35,139][00194] Avg episode reward: [(0, '4.896')] [2024-09-01 15:06:36,152][03034] Updated weights for policy 0, policy_version 210 (0.1038) [2024-09-01 15:06:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 860160. Throughput: 0: 237.0. Samples: 216960. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:06:40,138][00194] Avg episode reward: [(0, '5.017')] [2024-09-01 15:06:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 864256. Throughput: 0: 219.9. Samples: 218074. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:45,138][00194] Avg episode reward: [(0, '5.050')] [2024-09-01 15:06:50,137][03021] Saving new best policy, reward=5.050! [2024-09-01 15:06:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 868352. Throughput: 0: 228.8. Samples: 219678. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:50,148][00194] Avg episode reward: [(0, '5.099')] [2024-09-01 15:06:54,115][03021] Saving new best policy, reward=5.099! [2024-09-01 15:06:55,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 876544. Throughput: 0: 241.8. Samples: 220640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:06:55,143][00194] Avg episode reward: [(0, '5.143')] [2024-09-01 15:06:59,804][03021] Saving new best policy, reward=5.143! [2024-09-01 15:07:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.9, 300 sec: 902.5). Total num frames: 880640. Throughput: 0: 230.2. Samples: 221760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:00,143][00194] Avg episode reward: [(0, '5.173')] [2024-09-01 15:07:04,754][03021] Saving new best policy, reward=5.173! [2024-09-01 15:07:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 884736. Throughput: 0: 221.0. Samples: 222844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:05,139][00194] Avg episode reward: [(0, '5.234')] [2024-09-01 15:07:08,565][03021] Saving new best policy, reward=5.234! [2024-09-01 15:07:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 888832. Throughput: 0: 231.5. Samples: 223826. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:07:10,139][00194] Avg episode reward: [(0, '5.251')] [2024-09-01 15:07:12,406][03021] Saving new best policy, reward=5.251! [2024-09-01 15:07:15,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 892928. Throughput: 0: 236.7. Samples: 225168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:07:15,144][00194] Avg episode reward: [(0, '5.313')] [2024-09-01 15:07:18,300][03021] Saving new best policy, reward=5.313! [2024-09-01 15:07:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 897024. Throughput: 0: 215.1. Samples: 226208. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:20,141][00194] Avg episode reward: [(0, '5.267')] [2024-09-01 15:07:22,874][03034] Updated weights for policy 0, policy_version 220 (0.0056) [2024-09-01 15:07:25,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 901120. Throughput: 0: 221.4. Samples: 226922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:25,138][00194] Avg episode reward: [(0, '5.138')] [2024-09-01 15:07:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 905216. Throughput: 0: 237.6. Samples: 228766. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:30,147][00194] Avg episode reward: [(0, '5.040')] [2024-09-01 15:07:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 909312. Throughput: 0: 227.8. Samples: 229928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:07:35,138][00194] Avg episode reward: [(0, '4.980')] [2024-09-01 15:07:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 913408. Throughput: 0: 213.1. Samples: 230228. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:07:40,143][00194] Avg episode reward: [(0, '4.943')] [2024-09-01 15:07:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 921600. Throughput: 0: 226.8. Samples: 231964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:45,143][00194] Avg episode reward: [(0, '5.038')] [2024-09-01 15:07:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 925696. Throughput: 0: 232.8. Samples: 233320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:50,142][00194] Avg episode reward: [(0, '5.004')] [2024-09-01 15:07:55,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 929792. Throughput: 0: 225.4. Samples: 233968. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:07:55,144][00194] Avg episode reward: [(0, '5.034')] [2024-09-01 15:08:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 933888. Throughput: 0: 220.1. Samples: 235070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:00,138][00194] Avg episode reward: [(0, '5.008')] [2024-09-01 15:08:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 937984. Throughput: 0: 240.2. Samples: 237016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:05,139][00194] Avg episode reward: [(0, '5.089')] [2024-09-01 15:08:06,015][03034] Updated weights for policy 0, policy_version 230 (0.0049) [2024-09-01 15:08:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 942080. Throughput: 0: 232.2. Samples: 237372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:10,139][00194] Avg episode reward: [(0, '5.118')] [2024-09-01 15:08:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 946176. Throughput: 0: 216.0. Samples: 238488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:08:15,141][00194] Avg episode reward: [(0, '5.120')] [2024-09-01 15:08:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 950272. Throughput: 0: 227.3. Samples: 240156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:08:20,148][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000233_954368.pth... [2024-09-01 15:08:20,144][00194] Avg episode reward: [(0, '5.153')] [2024-09-01 15:08:20,244][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth [2024-09-01 15:08:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 958464. Throughput: 0: 243.2. Samples: 241170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:08:25,145][00194] Avg episode reward: [(0, '5.289')] [2024-09-01 15:08:30,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 962560. Throughput: 0: 227.2. Samples: 242190. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:30,141][00194] Avg episode reward: [(0, '5.218')] [2024-09-01 15:08:35,139][00194] Fps is (10 sec: 819.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 966656. Throughput: 0: 223.0. Samples: 243356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:35,141][00194] Avg episode reward: [(0, '5.357')] [2024-09-01 15:08:38,012][03021] Saving new best policy, reward=5.357! [2024-09-01 15:08:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 970752. Throughput: 0: 228.4. Samples: 244246. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:40,138][00194] Avg episode reward: [(0, '5.368')] [2024-09-01 15:08:41,954][03021] Saving new best policy, reward=5.368! [2024-09-01 15:08:45,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 974848. Throughput: 0: 234.8. Samples: 245638. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:45,142][00194] Avg episode reward: [(0, '5.387')] [2024-09-01 15:08:47,650][03021] Saving new best policy, reward=5.387! [2024-09-01 15:08:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 978944. Throughput: 0: 216.0. Samples: 246738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:50,141][00194] Avg episode reward: [(0, '5.273')] [2024-09-01 15:08:51,987][03034] Updated weights for policy 0, policy_version 240 (0.0550) [2024-09-01 15:08:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 983040. Throughput: 0: 224.0. Samples: 247452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:08:55,139][00194] Avg episode reward: [(0, '5.202')] [2024-09-01 15:09:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 991232. Throughput: 0: 240.3. Samples: 249302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:00,140][00194] Avg episode reward: [(0, '5.264')] [2024-09-01 15:09:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 991232. Throughput: 0: 225.0. Samples: 250282. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:05,139][00194] Avg episode reward: [(0, '5.497')] [2024-09-01 15:09:09,936][03021] Saving new best policy, reward=5.497! [2024-09-01 15:09:10,137][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 999424. Throughput: 0: 215.1. Samples: 250848. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:10,141][00194] Avg episode reward: [(0, '5.322')] [2024-09-01 15:09:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1003520. Throughput: 0: 225.7. Samples: 252344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:15,138][00194] Avg episode reward: [(0, '5.286')] [2024-09-01 15:09:20,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1007616. Throughput: 0: 232.5. Samples: 253816. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:20,141][00194] Avg episode reward: [(0, '5.171')] [2024-09-01 15:09:25,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1011712. Throughput: 0: 225.7. Samples: 254404. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:25,143][00194] Avg episode reward: [(0, '5.128')] [2024-09-01 15:09:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1015808. Throughput: 0: 224.1. Samples: 255722. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:09:30,140][00194] Avg episode reward: [(0, '5.294')] [2024-09-01 15:09:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1019904. Throughput: 0: 239.0. Samples: 257494. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:09:35,144][00194] Avg episode reward: [(0, '5.326')] [2024-09-01 15:09:35,395][03034] Updated weights for policy 0, policy_version 250 (0.0061) [2024-09-01 15:09:39,394][03021] Signal inference workers to stop experience collection... (250 times) [2024-09-01 15:09:39,524][03034] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 15:09:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1024000. Throughput: 0: 234.9. Samples: 258024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:09:40,141][00194] Avg episode reward: [(0, '5.417')] [2024-09-01 15:09:41,133][03021] Signal inference workers to resume experience collection... (250 times) [2024-09-01 15:09:41,136][03034] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 15:09:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1028096. Throughput: 0: 219.5. Samples: 259178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:09:45,139][00194] Avg episode reward: [(0, '5.498')] [2024-09-01 15:09:49,340][03021] Saving new best policy, reward=5.498! [2024-09-01 15:09:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1036288. Throughput: 0: 230.7. Samples: 260664. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:09:50,139][00194] Avg episode reward: [(0, '5.447')] [2024-09-01 15:09:55,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1040384. Throughput: 0: 238.4. Samples: 261574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:09:55,153][00194] Avg episode reward: [(0, '5.496')] [2024-09-01 15:10:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1044480. Throughput: 0: 227.6. Samples: 262586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:10:00,139][00194] Avg episode reward: [(0, '5.549')] [2024-09-01 15:10:03,716][03021] Saving new best policy, reward=5.549! [2024-09-01 15:10:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1048576. Throughput: 0: 225.6. Samples: 263966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:10:05,144][00194] Avg episode reward: [(0, '5.649')] [2024-09-01 15:10:07,553][03021] Saving new best policy, reward=5.649! [2024-09-01 15:10:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1052672. Throughput: 0: 227.4. Samples: 264638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:10,146][00194] Avg episode reward: [(0, '5.691')] [2024-09-01 15:10:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1056768. Throughput: 0: 228.5. Samples: 266004. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:15,139][00194] Avg episode reward: [(0, '5.710')] [2024-09-01 15:10:17,000][03021] Saving new best policy, reward=5.691! [2024-09-01 15:10:17,103][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth... [2024-09-01 15:10:17,286][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000206_843776.pth [2024-09-01 15:10:17,310][03021] Saving new best policy, reward=5.710! [2024-09-01 15:10:20,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1060864. Throughput: 0: 217.1. Samples: 267262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:20,139][00194] Avg episode reward: [(0, '5.703')] [2024-09-01 15:10:21,943][03034] Updated weights for policy 0, policy_version 260 (0.0722) [2024-09-01 15:10:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1064960. Throughput: 0: 219.8. Samples: 267914. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:10:25,139][00194] Avg episode reward: [(0, '5.700')] [2024-09-01 15:10:30,136][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1073152. Throughput: 0: 234.6. Samples: 269734. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:30,139][00194] Avg episode reward: [(0, '5.906')] [2024-09-01 15:10:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1073152. Throughput: 0: 224.4. Samples: 270760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:35,138][00194] Avg episode reward: [(0, '5.824')] [2024-09-01 15:10:35,459][03021] Saving new best policy, reward=5.906! [2024-09-01 15:10:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1081344. Throughput: 0: 215.5. Samples: 271270. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:40,139][00194] Avg episode reward: [(0, '5.875')] [2024-09-01 15:10:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1085440. Throughput: 0: 227.7. Samples: 272832. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:45,138][00194] Avg episode reward: [(0, '5.695')] [2024-09-01 15:10:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1089536. Throughput: 0: 231.3. Samples: 274376. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:50,138][00194] Avg episode reward: [(0, '5.816')] [2024-09-01 15:10:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1093632. Throughput: 0: 227.4. Samples: 274872. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:10:55,141][00194] Avg episode reward: [(0, '5.754')] [2024-09-01 15:11:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1097728. Throughput: 0: 221.0. Samples: 275948. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:11:00,139][00194] Avg episode reward: [(0, '5.781')] [2024-09-01 15:11:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1101824. Throughput: 0: 238.0. Samples: 277970. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:11:05,144][00194] Avg episode reward: [(0, '5.757')] [2024-09-01 15:11:05,615][03034] Updated weights for policy 0, policy_version 270 (0.1032) [2024-09-01 15:11:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1105920. Throughput: 0: 235.9. Samples: 278528. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:10,138][00194] Avg episode reward: [(0, '5.848')] [2024-09-01 15:11:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1110016. Throughput: 0: 218.7. Samples: 279576. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:15,143][00194] Avg episode reward: [(0, '5.596')] [2024-09-01 15:11:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1118208. Throughput: 0: 230.5. Samples: 281134. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:11:20,142][00194] Avg episode reward: [(0, '5.533')] [2024-09-01 15:11:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1122304. Throughput: 0: 240.7. Samples: 282100. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:25,138][00194] Avg episode reward: [(0, '5.706')] [2024-09-01 15:11:30,140][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1126400. Throughput: 0: 229.2. Samples: 283146. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:11:30,152][00194] Avg episode reward: [(0, '5.779')] [2024-09-01 15:11:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1130496. Throughput: 0: 225.8. Samples: 284536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:11:35,139][00194] Avg episode reward: [(0, '5.730')] [2024-09-01 15:11:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1134592. Throughput: 0: 230.4. Samples: 285242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:11:40,144][00194] Avg episode reward: [(0, '5.756')] [2024-09-01 15:11:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1138688. Throughput: 0: 244.0. Samples: 286930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:11:45,138][00194] Avg episode reward: [(0, '5.751')] [2024-09-01 15:11:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1142784. Throughput: 0: 222.8. Samples: 287998. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:11:50,142][00194] Avg episode reward: [(0, '5.934')] [2024-09-01 15:11:51,522][03021] Saving new best policy, reward=5.934! [2024-09-01 15:11:51,531][03034] Updated weights for policy 0, policy_version 280 (0.0527) [2024-09-01 15:11:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1146880. Throughput: 0: 226.8. Samples: 288732. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:11:55,141][00194] Avg episode reward: [(0, '6.167')] [2024-09-01 15:11:59,094][03021] Saving new best policy, reward=6.167! [2024-09-01 15:12:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1155072. Throughput: 0: 238.4. Samples: 290302. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:12:00,143][00194] Avg episode reward: [(0, '6.073')] [2024-09-01 15:12:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1159168. Throughput: 0: 228.0. Samples: 291394. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:12:05,141][00194] Avg episode reward: [(0, '6.193')] [2024-09-01 15:12:09,569][03021] Saving new best policy, reward=6.193! [2024-09-01 15:12:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1163264. Throughput: 0: 222.5. Samples: 292112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:12:10,142][00194] Avg episode reward: [(0, '6.283')] [2024-09-01 15:12:13,418][03021] Saving new best policy, reward=6.283! [2024-09-01 15:12:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1167360. Throughput: 0: 230.2. Samples: 293504. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:12:15,143][00194] Avg episode reward: [(0, '6.266')] [2024-09-01 15:12:17,182][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000286_1171456.pth... [2024-09-01 15:12:17,285][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000233_954368.pth [2024-09-01 15:12:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1171456. Throughput: 0: 236.5. Samples: 295178. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:12:20,145][00194] Avg episode reward: [(0, '6.343')] [2024-09-01 15:12:22,818][03021] Saving new best policy, reward=6.343! [2024-09-01 15:12:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1175552. Throughput: 0: 228.8. Samples: 295536. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:12:25,148][00194] Avg episode reward: [(0, '6.317')] [2024-09-01 15:12:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1179648. Throughput: 0: 222.4. Samples: 296940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:30,139][00194] Avg episode reward: [(0, '6.348')] [2024-09-01 15:12:34,929][03021] Saving new best policy, reward=6.348! [2024-09-01 15:12:34,946][03034] Updated weights for policy 0, policy_version 290 (0.0073) [2024-09-01 15:12:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1187840. Throughput: 0: 235.0. Samples: 298572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:35,140][00194] Avg episode reward: [(0, '6.431')] [2024-09-01 15:12:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1187840. Throughput: 0: 234.4. Samples: 299280. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:40,139][00194] Avg episode reward: [(0, '6.319')] [2024-09-01 15:12:40,529][03021] Saving new best policy, reward=6.431! [2024-09-01 15:12:45,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1191936. Throughput: 0: 224.2. Samples: 300392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:45,138][00194] Avg episode reward: [(0, '6.147')] [2024-09-01 15:12:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1200128. Throughput: 0: 229.2. Samples: 301710. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:50,143][00194] Avg episode reward: [(0, '6.167')] [2024-09-01 15:12:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1204224. Throughput: 0: 236.3. Samples: 302744. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:12:55,152][00194] Avg episode reward: [(0, '6.236')] [2024-09-01 15:13:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1208320. Throughput: 0: 228.6. Samples: 303792. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:13:00,138][00194] Avg episode reward: [(0, '6.283')] [2024-09-01 15:13:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1212416. Throughput: 0: 219.6. Samples: 305060. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:13:05,138][00194] Avg episode reward: [(0, '6.165')] [2024-09-01 15:13:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1216512. Throughput: 0: 228.8. Samples: 305830. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:13:10,140][00194] Avg episode reward: [(0, '6.279')] [2024-09-01 15:13:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1220608. Throughput: 0: 229.5. Samples: 307266. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:13:15,139][00194] Avg episode reward: [(0, '6.328')] [2024-09-01 15:13:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1224704. Throughput: 0: 221.9. Samples: 308558. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:13:20,145][00194] Avg episode reward: [(0, '6.246')] [2024-09-01 15:13:21,318][03034] Updated weights for policy 0, policy_version 300 (0.2803) [2024-09-01 15:13:23,529][03021] Signal inference workers to stop experience collection... (300 times) [2024-09-01 15:13:23,568][03034] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 15:13:24,995][03021] Signal inference workers to resume experience collection... (300 times) [2024-09-01 15:13:24,996][03034] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 15:13:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1232896. Throughput: 0: 224.6. Samples: 309386. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:25,142][00194] Avg episode reward: [(0, '6.128')] [2024-09-01 15:13:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1236992. Throughput: 0: 232.9. Samples: 310872. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:30,141][00194] Avg episode reward: [(0, '6.167')] [2024-09-01 15:13:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1241088. Throughput: 0: 226.8. Samples: 311914. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:35,142][00194] Avg episode reward: [(0, '6.330')] [2024-09-01 15:13:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1245184. Throughput: 0: 218.6. Samples: 312580. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:40,139][00194] Avg episode reward: [(0, '6.510')] [2024-09-01 15:13:42,819][03021] Saving new best policy, reward=6.510! [2024-09-01 15:13:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1249280. Throughput: 0: 231.6. Samples: 314216. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:13:45,142][00194] Avg episode reward: [(0, '6.641')] [2024-09-01 15:13:50,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1253376. Throughput: 0: 220.0. Samples: 314960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:13:50,145][00194] Avg episode reward: [(0, '6.569')] [2024-09-01 15:13:55,152][00194] Fps is (10 sec: 409.0, 60 sec: 819.0, 300 sec: 888.6). Total num frames: 1253376. Throughput: 0: 210.5. Samples: 315306. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:13:55,161][00194] Avg episode reward: [(0, '6.626')] [2024-09-01 15:13:59,760][03021] Saving new best policy, reward=6.641! [2024-09-01 15:14:00,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1257472. Throughput: 0: 190.1. Samples: 315820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:14:00,142][00194] Avg episode reward: [(0, '6.549')] [2024-09-01 15:14:05,136][00194] Fps is (10 sec: 410.2, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 1257472. Throughput: 0: 184.2. Samples: 316846. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:14:05,142][00194] Avg episode reward: [(0, '6.598')] [2024-09-01 15:14:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1265664. Throughput: 0: 181.2. Samples: 317540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:14:10,141][00194] Avg episode reward: [(0, '6.628')] [2024-09-01 15:14:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1269760. Throughput: 0: 178.1. Samples: 318886. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:15,145][03034] Updated weights for policy 0, policy_version 310 (0.3333) [2024-09-01 15:14:15,146][00194] Avg episode reward: [(0, '6.511')] [2024-09-01 15:14:19,878][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth... [2024-09-01 15:14:19,988][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth [2024-09-01 15:14:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1273856. Throughput: 0: 177.3. Samples: 319894. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:20,144][00194] Avg episode reward: [(0, '6.553')] [2024-09-01 15:14:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1277952. Throughput: 0: 185.9. Samples: 320944. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:25,139][00194] Avg episode reward: [(0, '6.714')] [2024-09-01 15:14:27,506][03021] Saving new best policy, reward=6.714! [2024-09-01 15:14:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1282048. Throughput: 0: 179.1. Samples: 322274. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:30,144][00194] Avg episode reward: [(0, '6.913')] [2024-09-01 15:14:33,142][03021] Saving new best policy, reward=6.913! [2024-09-01 15:14:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1286144. Throughput: 0: 185.7. Samples: 323316. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:35,139][00194] Avg episode reward: [(0, '6.828')] [2024-09-01 15:14:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 1290240. Throughput: 0: 193.5. Samples: 324012. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:40,139][00194] Avg episode reward: [(0, '6.677')] [2024-09-01 15:14:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 1294336. Throughput: 0: 218.5. Samples: 325652. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:14:45,145][00194] Avg episode reward: [(0, '6.578')] [2024-09-01 15:14:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 874.7). Total num frames: 1298432. Throughput: 0: 225.8. Samples: 327008. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:14:50,141][00194] Avg episode reward: [(0, '6.610')] [2024-09-01 15:14:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.4, 300 sec: 874.7). Total num frames: 1302528. Throughput: 0: 218.0. Samples: 327352. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:14:55,140][00194] Avg episode reward: [(0, '6.819')] [2024-09-01 15:14:59,874][03034] Updated weights for policy 0, policy_version 320 (0.0546) [2024-09-01 15:15:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1310720. Throughput: 0: 226.2. Samples: 329064. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:15:00,142][00194] Avg episode reward: [(0, '6.961')] [2024-09-01 15:15:03,678][03021] Saving new best policy, reward=6.961! [2024-09-01 15:15:05,140][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1314816. Throughput: 0: 235.3. Samples: 330482. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 15:15:05,151][00194] Avg episode reward: [(0, '7.015')] [2024-09-01 15:15:09,277][03021] Saving new best policy, reward=7.015! [2024-09-01 15:15:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1318912. Throughput: 0: 227.0. Samples: 331158. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:10,145][00194] Avg episode reward: [(0, '7.072')] [2024-09-01 15:15:14,145][03021] Saving new best policy, reward=7.072! [2024-09-01 15:15:15,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1323008. Throughput: 0: 221.0. Samples: 332220. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:15,143][00194] Avg episode reward: [(0, '7.007')] [2024-09-01 15:15:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1327104. Throughput: 0: 234.8. Samples: 333882. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:20,143][00194] Avg episode reward: [(0, '6.911')] [2024-09-01 15:15:25,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1331200. Throughput: 0: 234.4. Samples: 334562. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:25,147][00194] Avg episode reward: [(0, '6.986')] [2024-09-01 15:15:30,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1335296. Throughput: 0: 221.2. Samples: 335608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:30,143][00194] Avg episode reward: [(0, '6.933')] [2024-09-01 15:15:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1339392. Throughput: 0: 225.0. Samples: 337132. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:35,139][00194] Avg episode reward: [(0, '7.059')] [2024-09-01 15:15:40,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1347584. Throughput: 0: 237.3. Samples: 338032. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:40,142][00194] Avg episode reward: [(0, '7.376')] [2024-09-01 15:15:45,089][03021] Saving new best policy, reward=7.376! [2024-09-01 15:15:45,099][03034] Updated weights for policy 0, policy_version 330 (0.0529) [2024-09-01 15:15:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1351680. Throughput: 0: 229.2. Samples: 339380. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:45,139][00194] Avg episode reward: [(0, '7.517')] [2024-09-01 15:15:49,923][03021] Saving new best policy, reward=7.517! [2024-09-01 15:15:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1355776. Throughput: 0: 220.7. Samples: 340412. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:15:50,139][00194] Avg episode reward: [(0, '7.502')] [2024-09-01 15:15:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1359872. Throughput: 0: 229.2. Samples: 341472. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:15:55,139][00194] Avg episode reward: [(0, '7.252')] [2024-09-01 15:16:00,139][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1363968. Throughput: 0: 236.3. Samples: 342856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:16:00,147][00194] Avg episode reward: [(0, '7.138')] [2024-09-01 15:16:05,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1368064. Throughput: 0: 219.8. Samples: 343772. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:16:05,144][00194] Avg episode reward: [(0, '7.174')] [2024-09-01 15:16:10,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1372160. Throughput: 0: 224.0. Samples: 344644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:16:10,142][00194] Avg episode reward: [(0, '7.181')] [2024-09-01 15:16:15,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1376256. Throughput: 0: 241.7. Samples: 346482. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:16:15,144][00194] Avg episode reward: [(0, '7.125')] [2024-09-01 15:16:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1380352. Throughput: 0: 233.2. Samples: 347628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:16:20,139][00194] Avg episode reward: [(0, '7.162')] [2024-09-01 15:16:21,157][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000338_1384448.pth... [2024-09-01 15:16:21,298][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000286_1171456.pth [2024-09-01 15:16:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 1384448. Throughput: 0: 224.4. Samples: 348128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:16:25,146][00194] Avg episode reward: [(0, '7.133')] [2024-09-01 15:16:29,921][03034] Updated weights for policy 0, policy_version 340 (0.1170) [2024-09-01 15:16:30,137][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1392640. Throughput: 0: 232.4. Samples: 349840. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:16:30,139][00194] Avg episode reward: [(0, '7.302')] [2024-09-01 15:16:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1396736. Throughput: 0: 234.5. Samples: 350966. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:16:35,141][00194] Avg episode reward: [(0, '7.031')] [2024-09-01 15:16:40,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1400832. Throughput: 0: 226.8. Samples: 351678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:16:40,140][00194] Avg episode reward: [(0, '6.916')] [2024-09-01 15:16:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1404928. Throughput: 0: 224.9. Samples: 352976. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:16:45,139][00194] Avg episode reward: [(0, '6.753')] [2024-09-01 15:16:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1409024. Throughput: 0: 243.8. Samples: 354744. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:16:50,154][00194] Avg episode reward: [(0, '6.763')] [2024-09-01 15:16:55,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 1413120. Throughput: 0: 233.5. Samples: 355154. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:16:55,147][00194] Avg episode reward: [(0, '6.884')] [2024-09-01 15:17:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1417216. Throughput: 0: 218.9. Samples: 356334. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:17:00,141][00194] Avg episode reward: [(0, '6.823')] [2024-09-01 15:17:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1421312. Throughput: 0: 230.3. Samples: 357992. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:17:05,144][00194] Avg episode reward: [(0, '7.117')] [2024-09-01 15:17:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1429504. Throughput: 0: 236.2. Samples: 358756. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:10,139][00194] Avg episode reward: [(0, '7.097')] [2024-09-01 15:17:14,793][03034] Updated weights for policy 0, policy_version 350 (0.1582) [2024-09-01 15:17:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1433600. Throughput: 0: 225.8. Samples: 360000. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:15,139][00194] Avg episode reward: [(0, '7.085')] [2024-09-01 15:17:18,875][03021] Signal inference workers to stop experience collection... (350 times) [2024-09-01 15:17:18,910][03034] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 15:17:19,857][03021] Signal inference workers to resume experience collection... (350 times) [2024-09-01 15:17:19,859][03034] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 15:17:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1437696. Throughput: 0: 224.4. Samples: 361062. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:20,145][00194] Avg episode reward: [(0, '7.128')] [2024-09-01 15:17:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1441792. Throughput: 0: 229.5. Samples: 362004. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:25,143][00194] Avg episode reward: [(0, '7.602')] [2024-09-01 15:17:27,477][03021] Saving new best policy, reward=7.602! [2024-09-01 15:17:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1445888. Throughput: 0: 236.4. Samples: 363612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:30,138][00194] Avg episode reward: [(0, '7.554')] [2024-09-01 15:17:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1449984. Throughput: 0: 219.3. Samples: 364612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:35,139][00194] Avg episode reward: [(0, '7.587')] [2024-09-01 15:17:40,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1454080. Throughput: 0: 222.6. Samples: 365170. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:40,139][00194] Avg episode reward: [(0, '7.781')] [2024-09-01 15:17:41,646][03021] Saving new best policy, reward=7.781! [2024-09-01 15:17:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1458176. Throughput: 0: 239.3. Samples: 367104. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:45,146][00194] Avg episode reward: [(0, '7.998')] [2024-09-01 15:17:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1462272. Throughput: 0: 228.2. Samples: 368260. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:17:50,139][00194] Avg episode reward: [(0, '8.053')] [2024-09-01 15:17:50,553][03021] Saving new best policy, reward=7.998! [2024-09-01 15:17:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1466368. Throughput: 0: 220.9. Samples: 368696. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:17:55,140][00194] Avg episode reward: [(0, '8.182')] [2024-09-01 15:17:55,831][03021] Saving new best policy, reward=8.053! [2024-09-01 15:17:59,646][03021] Saving new best policy, reward=8.182! [2024-09-01 15:17:59,666][03034] Updated weights for policy 0, policy_version 360 (0.1053) [2024-09-01 15:18:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1474560. Throughput: 0: 230.3. Samples: 370362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:00,139][00194] Avg episode reward: [(0, '8.015')] [2024-09-01 15:18:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1478656. Throughput: 0: 233.6. Samples: 371574. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:05,138][00194] Avg episode reward: [(0, '8.058')] [2024-09-01 15:18:10,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1482752. Throughput: 0: 228.5. Samples: 372288. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:18:10,151][00194] Avg episode reward: [(0, '8.125')] [2024-09-01 15:18:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1486848. Throughput: 0: 216.3. Samples: 373346. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:18:15,140][00194] Avg episode reward: [(0, '7.960')] [2024-09-01 15:18:17,522][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000364_1490944.pth... [2024-09-01 15:18:17,625][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth [2024-09-01 15:18:20,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1490944. Throughput: 0: 236.2. Samples: 375240. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:20,147][00194] Avg episode reward: [(0, '7.430')] [2024-09-01 15:18:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1495040. Throughput: 0: 233.9. Samples: 375696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:25,141][00194] Avg episode reward: [(0, '7.439')] [2024-09-01 15:18:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1499136. Throughput: 0: 218.5. Samples: 376938. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:30,139][00194] Avg episode reward: [(0, '7.559')] [2024-09-01 15:18:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 1503232. Throughput: 0: 224.2. Samples: 378348. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:35,147][00194] Avg episode reward: [(0, '7.365')] [2024-09-01 15:18:40,136][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1511424. Throughput: 0: 236.2. Samples: 379326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:18:40,138][00194] Avg episode reward: [(0, '7.387')] [2024-09-01 15:18:44,260][03034] Updated weights for policy 0, policy_version 370 (0.1036) [2024-09-01 15:18:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 1515520. Throughput: 0: 225.3. Samples: 380500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:18:45,140][00194] Avg episode reward: [(0, '7.382')] [2024-09-01 15:18:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.6). Total num frames: 1519616. Throughput: 0: 223.1. Samples: 381612. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:18:50,139][00194] Avg episode reward: [(0, '7.451')] [2024-09-01 15:18:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1523712. Throughput: 0: 226.3. Samples: 382472. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:18:55,139][00194] Avg episode reward: [(0, '7.643')] [2024-09-01 15:19:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1527808. Throughput: 0: 241.7. Samples: 384222. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:19:00,139][00194] Avg episode reward: [(0, '7.557')] [2024-09-01 15:19:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1531904. Throughput: 0: 221.1. Samples: 385188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:19:05,141][00194] Avg episode reward: [(0, '7.454')] [2024-09-01 15:19:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 1536000. Throughput: 0: 223.6. Samples: 385758. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:19:10,147][00194] Avg episode reward: [(0, '7.521')] [2024-09-01 15:19:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1544192. Throughput: 0: 237.3. Samples: 387616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:19:15,143][00194] Avg episode reward: [(0, '7.540')] [2024-09-01 15:19:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1548288. Throughput: 0: 228.8. Samples: 388646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:20,144][00194] Avg episode reward: [(0, '7.705')] [2024-09-01 15:19:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1552384. Throughput: 0: 224.5. Samples: 389428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:25,139][00194] Avg episode reward: [(0, '7.571')] [2024-09-01 15:19:28,646][03034] Updated weights for policy 0, policy_version 380 (0.0036) [2024-09-01 15:19:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1556480. Throughput: 0: 227.3. Samples: 390730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:30,139][00194] Avg episode reward: [(0, '7.616')] [2024-09-01 15:19:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1560576. Throughput: 0: 241.3. Samples: 392472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:35,139][00194] Avg episode reward: [(0, '7.622')] [2024-09-01 15:19:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1564672. Throughput: 0: 228.7. Samples: 392762. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:19:40,146][00194] Avg episode reward: [(0, '7.711')] [2024-09-01 15:19:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1568768. Throughput: 0: 221.4. Samples: 394184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:19:45,144][00194] Avg episode reward: [(0, '7.633')] [2024-09-01 15:19:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1576960. Throughput: 0: 236.8. Samples: 395842. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:50,147][00194] Avg episode reward: [(0, '7.947')] [2024-09-01 15:19:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1576960. Throughput: 0: 238.4. Samples: 396488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:19:55,143][00194] Avg episode reward: [(0, '8.099')] [2024-09-01 15:20:00,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1581056. Throughput: 0: 222.7. Samples: 397638. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:00,141][00194] Avg episode reward: [(0, '8.349')] [2024-09-01 15:20:04,372][03021] Saving new best policy, reward=8.349! [2024-09-01 15:20:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1589248. Throughput: 0: 229.9. Samples: 398990. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:05,147][00194] Avg episode reward: [(0, '8.654')] [2024-09-01 15:20:08,131][03021] Saving new best policy, reward=8.654! [2024-09-01 15:20:10,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1593344. Throughput: 0: 232.7. Samples: 399902. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:10,171][00194] Avg episode reward: [(0, '8.769')] [2024-09-01 15:20:13,538][03021] Saving new best policy, reward=8.769! [2024-09-01 15:20:13,562][03034] Updated weights for policy 0, policy_version 390 (0.0562) [2024-09-01 15:20:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1597440. Throughput: 0: 227.4. Samples: 400964. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:15,141][00194] Avg episode reward: [(0, '8.925')] [2024-09-01 15:20:18,534][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000391_1601536.pth... [2024-09-01 15:20:18,642][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000338_1384448.pth [2024-09-01 15:20:18,660][03021] Saving new best policy, reward=8.925! [2024-09-01 15:20:20,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1601536. Throughput: 0: 220.8. Samples: 402406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:20,139][00194] Avg episode reward: [(0, '8.885')] [2024-09-01 15:20:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1605632. Throughput: 0: 228.4. Samples: 403040. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:25,139][00194] Avg episode reward: [(0, '9.058')] [2024-09-01 15:20:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1609728. Throughput: 0: 235.3. Samples: 404774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:30,139][00194] Avg episode reward: [(0, '9.159')] [2024-09-01 15:20:31,569][03021] Saving new best policy, reward=9.058! [2024-09-01 15:20:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1613824. Throughput: 0: 221.2. Samples: 405798. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:35,144][00194] Avg episode reward: [(0, '9.114')] [2024-09-01 15:20:36,747][03021] Saving new best policy, reward=9.159! [2024-09-01 15:20:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1617920. Throughput: 0: 221.7. Samples: 406466. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:40,139][00194] Avg episode reward: [(0, '9.150')] [2024-09-01 15:20:45,137][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1626112. Throughput: 0: 233.1. Samples: 408126. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:45,141][00194] Avg episode reward: [(0, '9.159')] [2024-09-01 15:20:50,136][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1630208. Throughput: 0: 227.6. Samples: 409234. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:50,143][00194] Avg episode reward: [(0, '9.159')] [2024-09-01 15:20:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1634304. Throughput: 0: 222.7. Samples: 409924. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:20:55,138][00194] Avg episode reward: [(0, '8.673')] [2024-09-01 15:20:58,503][03034] Updated weights for policy 0, policy_version 400 (0.1013) [2024-09-01 15:21:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1638400. Throughput: 0: 228.9. Samples: 411264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:00,139][00194] Avg episode reward: [(0, '8.707')] [2024-09-01 15:21:00,841][03021] Signal inference workers to stop experience collection... (400 times) [2024-09-01 15:21:00,894][03034] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-09-01 15:21:01,769][03021] Signal inference workers to resume experience collection... (400 times) [2024-09-01 15:21:01,770][03034] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-09-01 15:21:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1642496. Throughput: 0: 236.0. Samples: 413024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:05,144][00194] Avg episode reward: [(0, '8.509')] [2024-09-01 15:21:10,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1646592. Throughput: 0: 229.0. Samples: 413346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:21:10,145][00194] Avg episode reward: [(0, '8.860')] [2024-09-01 15:21:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1650688. Throughput: 0: 225.6. Samples: 414924. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:21:15,145][00194] Avg episode reward: [(0, '8.898')] [2024-09-01 15:21:20,136][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1658880. Throughput: 0: 235.2. Samples: 416384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:21:20,139][00194] Avg episode reward: [(0, '8.862')] [2024-09-01 15:21:25,142][00194] Fps is (10 sec: 1228.0, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 1662976. Throughput: 0: 240.5. Samples: 417292. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:25,145][00194] Avg episode reward: [(0, '8.784')] [2024-09-01 15:21:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1667072. Throughput: 0: 225.9. Samples: 418292. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:30,145][00194] Avg episode reward: [(0, '8.625')] [2024-09-01 15:21:35,136][00194] Fps is (10 sec: 819.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1671168. Throughput: 0: 235.0. Samples: 419808. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:35,139][00194] Avg episode reward: [(0, '8.619')] [2024-09-01 15:21:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1675264. Throughput: 0: 235.3. Samples: 420512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:21:40,139][00194] Avg episode reward: [(0, '8.796')] [2024-09-01 15:21:42,049][03034] Updated weights for policy 0, policy_version 410 (0.0512) [2024-09-01 15:21:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1679360. Throughput: 0: 230.7. Samples: 421646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:45,142][00194] Avg episode reward: [(0, '8.972')] [2024-09-01 15:21:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1683456. Throughput: 0: 228.8. Samples: 423320. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:21:50,147][00194] Avg episode reward: [(0, '8.799')] [2024-09-01 15:21:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1691648. Throughput: 0: 237.5. Samples: 424032. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:21:55,141][00194] Avg episode reward: [(0, '8.738')] [2024-09-01 15:22:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1695744. Throughput: 0: 232.9. Samples: 425404. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:00,142][00194] Avg episode reward: [(0, '8.934')] [2024-09-01 15:22:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1699840. Throughput: 0: 224.1. Samples: 426470. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:05,138][00194] Avg episode reward: [(0, '8.926')] [2024-09-01 15:22:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 1703936. Throughput: 0: 227.1. Samples: 427512. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:10,139][00194] Avg episode reward: [(0, '9.215')] [2024-09-01 15:22:12,456][03021] Saving new best policy, reward=9.215! [2024-09-01 15:22:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1708032. Throughput: 0: 236.5. Samples: 428934. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:15,138][00194] Avg episode reward: [(0, '9.419')] [2024-09-01 15:22:17,425][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth... [2024-09-01 15:22:17,549][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000364_1490944.pth [2024-09-01 15:22:17,568][03021] Saving new best policy, reward=9.419! [2024-09-01 15:22:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1712128. Throughput: 0: 225.7. Samples: 429964. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:22:20,144][00194] Avg episode reward: [(0, '9.369')] [2024-09-01 15:22:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 1716224. Throughput: 0: 221.6. Samples: 430486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:22:25,139][00194] Avg episode reward: [(0, '9.617')] [2024-09-01 15:22:26,655][03021] Saving new best policy, reward=9.617! [2024-09-01 15:22:26,665][03034] Updated weights for policy 0, policy_version 420 (0.0538) [2024-09-01 15:22:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1720320. Throughput: 0: 238.7. Samples: 432388. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:30,144][00194] Avg episode reward: [(0, '9.930')] [2024-09-01 15:22:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1724416. Throughput: 0: 227.9. Samples: 433574. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:35,138][00194] Avg episode reward: [(0, '9.955')] [2024-09-01 15:22:35,506][03021] Saving new best policy, reward=9.930! [2024-09-01 15:22:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1728512. Throughput: 0: 222.9. Samples: 434062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:22:40,139][00194] Avg episode reward: [(0, '10.560')] [2024-09-01 15:22:40,830][03021] Saving new best policy, reward=9.955! [2024-09-01 15:22:44,663][03021] Saving new best policy, reward=10.560! [2024-09-01 15:22:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1736704. Throughput: 0: 229.2. Samples: 435716. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:45,138][00194] Avg episode reward: [(0, '10.784')] [2024-09-01 15:22:48,436][03021] Saving new best policy, reward=10.784! [2024-09-01 15:22:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1740800. Throughput: 0: 238.2. Samples: 437188. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:22:50,142][00194] Avg episode reward: [(0, '10.897')] [2024-09-01 15:22:53,734][03021] Saving new best policy, reward=10.897! [2024-09-01 15:22:55,140][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1744896. Throughput: 0: 227.2. Samples: 437736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:22:55,143][00194] Avg episode reward: [(0, '10.913')] [2024-09-01 15:22:58,742][03021] Saving new best policy, reward=10.913! [2024-09-01 15:23:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1748992. Throughput: 0: 218.8. Samples: 438778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:23:00,138][00194] Avg episode reward: [(0, '11.191')] [2024-09-01 15:23:02,560][03021] Saving new best policy, reward=11.191! [2024-09-01 15:23:05,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1753088. Throughput: 0: 236.0. Samples: 440584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:05,140][00194] Avg episode reward: [(0, '11.546')] [2024-09-01 15:23:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1757184. Throughput: 0: 238.4. Samples: 441212. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:10,139][00194] Avg episode reward: [(0, '11.671')] [2024-09-01 15:23:11,874][03021] Saving new best policy, reward=11.546! [2024-09-01 15:23:11,903][03034] Updated weights for policy 0, policy_version 430 (0.0095) [2024-09-01 15:23:11,973][03021] Saving new best policy, reward=11.671! [2024-09-01 15:23:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1761280. Throughput: 0: 218.6. Samples: 442226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:23:15,148][00194] Avg episode reward: [(0, '11.796')] [2024-09-01 15:23:16,760][03021] Saving new best policy, reward=11.796! [2024-09-01 15:23:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1765376. Throughput: 0: 229.0. Samples: 443878. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:23:20,142][00194] Avg episode reward: [(0, '11.817')] [2024-09-01 15:23:24,269][03021] Saving new best policy, reward=11.817! [2024-09-01 15:23:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1773568. Throughput: 0: 237.4. Samples: 444746. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:25,145][00194] Avg episode reward: [(0, '11.696')] [2024-09-01 15:23:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 930.3). Total num frames: 1777664. Throughput: 0: 228.0. Samples: 445978. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:30,138][00194] Avg episode reward: [(0, '11.624')] [2024-09-01 15:23:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1781760. Throughput: 0: 218.2. Samples: 447008. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:35,144][00194] Avg episode reward: [(0, '11.782')] [2024-09-01 15:23:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 1785856. Throughput: 0: 229.6. Samples: 448068. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:40,141][00194] Avg episode reward: [(0, '11.478')] [2024-09-01 15:23:45,137][00194] Fps is (10 sec: 819.2, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1789952. Throughput: 0: 236.1. Samples: 449402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:23:45,143][00194] Avg episode reward: [(0, '11.481')] [2024-09-01 15:23:50,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 1794048. Throughput: 0: 219.1. Samples: 450444. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:23:50,141][00194] Avg episode reward: [(0, '11.190')] [2024-09-01 15:23:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1798144. Throughput: 0: 220.0. Samples: 451112. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:23:55,140][00194] Avg episode reward: [(0, '11.275')] [2024-09-01 15:23:57,823][03034] Updated weights for policy 0, policy_version 440 (0.1042) [2024-09-01 15:24:00,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1802240. Throughput: 0: 224.4. Samples: 452324. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:24:00,142][00194] Avg episode reward: [(0, '10.995')] [2024-09-01 15:24:05,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1802240. Throughput: 0: 206.5. Samples: 453172. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:24:05,139][00194] Avg episode reward: [(0, '10.939')] [2024-09-01 15:24:10,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1806336. Throughput: 0: 195.7. Samples: 453554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:24:10,142][00194] Avg episode reward: [(0, '10.673')] [2024-09-01 15:24:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1810432. Throughput: 0: 201.9. Samples: 455062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:24:15,138][00194] Avg episode reward: [(0, '10.622')] [2024-09-01 15:24:18,898][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000444_1818624.pth... [2024-09-01 15:24:19,000][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000391_1601536.pth [2024-09-01 15:24:20,136][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1818624. Throughput: 0: 210.0. Samples: 456458. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:24:20,140][00194] Avg episode reward: [(0, '10.791')] [2024-09-01 15:24:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1822720. Throughput: 0: 203.5. Samples: 457226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:25,138][00194] Avg episode reward: [(0, '10.800')] [2024-09-01 15:24:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1826816. Throughput: 0: 196.8. Samples: 458258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:30,139][00194] Avg episode reward: [(0, '10.674')] [2024-09-01 15:24:35,139][00194] Fps is (10 sec: 818.9, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1830912. Throughput: 0: 206.7. Samples: 459744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:35,142][00194] Avg episode reward: [(0, '10.785')] [2024-09-01 15:24:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1835008. Throughput: 0: 205.1. Samples: 460340. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:40,150][00194] Avg episode reward: [(0, '10.915')] [2024-09-01 15:24:45,137][00194] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 1839104. Throughput: 0: 206.4. Samples: 461610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:45,144][00194] Avg episode reward: [(0, '10.841')] [2024-09-01 15:24:47,754][03034] Updated weights for policy 0, policy_version 450 (0.1988) [2024-09-01 15:24:50,104][03021] Signal inference workers to stop experience collection... (450 times) [2024-09-01 15:24:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 1843200. Throughput: 0: 220.3. Samples: 463084. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:24:50,139][00194] Avg episode reward: [(0, '10.751')] [2024-09-01 15:24:50,176][03034] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-09-01 15:24:51,090][03021] Signal inference workers to resume experience collection... (450 times) [2024-09-01 15:24:51,092][03034] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-09-01 15:24:55,136][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 1851392. Throughput: 0: 227.1. Samples: 463774. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:24:55,138][00194] Avg episode reward: [(0, '10.648')] [2024-09-01 15:25:00,138][00194] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1855488. Throughput: 0: 230.3. Samples: 465428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:00,141][00194] Avg episode reward: [(0, '10.846')] [2024-09-01 15:25:05,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1855488. Throughput: 0: 221.8. Samples: 466438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:05,143][00194] Avg episode reward: [(0, '11.166')] [2024-09-01 15:25:10,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1863680. Throughput: 0: 225.9. Samples: 467392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:10,139][00194] Avg episode reward: [(0, '11.250')] [2024-09-01 15:25:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1867776. Throughput: 0: 233.7. Samples: 468776. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:15,140][00194] Avg episode reward: [(0, '11.464')] [2024-09-01 15:25:20,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1871872. Throughput: 0: 223.7. Samples: 469810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:20,143][00194] Avg episode reward: [(0, '11.591')] [2024-09-01 15:25:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1875968. Throughput: 0: 226.4. Samples: 470528. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:25,138][00194] Avg episode reward: [(0, '11.710')] [2024-09-01 15:25:30,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1880064. Throughput: 0: 232.8. Samples: 472086. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:30,141][00194] Avg episode reward: [(0, '11.390')] [2024-09-01 15:25:31,140][03034] Updated weights for policy 0, policy_version 460 (0.1982) [2024-09-01 15:25:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1884160. Throughput: 0: 229.9. Samples: 473428. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:35,139][00194] Avg episode reward: [(0, '11.091')] [2024-09-01 15:25:40,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1888256. Throughput: 0: 225.8. Samples: 473934. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:40,144][00194] Avg episode reward: [(0, '11.271')] [2024-09-01 15:25:45,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1896448. Throughput: 0: 226.7. Samples: 475628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:45,144][00194] Avg episode reward: [(0, '11.524')] [2024-09-01 15:25:50,136][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1900544. Throughput: 0: 235.2. Samples: 477024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:25:50,141][00194] Avg episode reward: [(0, '11.603')] [2024-09-01 15:25:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1904640. Throughput: 0: 229.6. Samples: 477724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:25:55,143][00194] Avg episode reward: [(0, '11.826')] [2024-09-01 15:25:58,989][03021] Saving new best policy, reward=11.826! [2024-09-01 15:26:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1908736. Throughput: 0: 220.5. Samples: 478700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:00,138][00194] Avg episode reward: [(0, '11.704')] [2024-09-01 15:26:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1912832. Throughput: 0: 236.1. Samples: 480434. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:05,139][00194] Avg episode reward: [(0, '11.518')] [2024-09-01 15:26:10,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1916928. Throughput: 0: 234.1. Samples: 481064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:10,145][00194] Avg episode reward: [(0, '11.552')] [2024-09-01 15:26:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1921024. Throughput: 0: 221.6. Samples: 482058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:15,139][00194] Avg episode reward: [(0, '11.207')] [2024-09-01 15:26:17,400][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000470_1925120.pth... [2024-09-01 15:26:17,404][03034] Updated weights for policy 0, policy_version 470 (0.0541) [2024-09-01 15:26:17,508][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000418_1712128.pth [2024-09-01 15:26:20,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1925120. Throughput: 0: 230.8. Samples: 483814. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:26:20,146][00194] Avg episode reward: [(0, '11.148')] [2024-09-01 15:26:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1933312. Throughput: 0: 234.7. Samples: 484496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:25,139][00194] Avg episode reward: [(0, '11.199')] [2024-09-01 15:26:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1933312. Throughput: 0: 222.9. Samples: 485660. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:30,138][00194] Avg episode reward: [(0, '11.042')] [2024-09-01 15:26:35,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1937408. Throughput: 0: 222.0. Samples: 487012. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:26:35,139][00194] Avg episode reward: [(0, '11.254')] [2024-09-01 15:26:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1945600. Throughput: 0: 221.3. Samples: 487684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:40,139][00194] Avg episode reward: [(0, '11.548')] [2024-09-01 15:26:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 1949696. Throughput: 0: 231.0. Samples: 489094. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:26:45,138][00194] Avg episode reward: [(0, '11.684')] [2024-09-01 15:26:50,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1953792. Throughput: 0: 215.8. Samples: 490144. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:26:50,146][00194] Avg episode reward: [(0, '11.510')] [2024-09-01 15:26:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1957888. Throughput: 0: 223.5. Samples: 491122. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:26:55,139][00194] Avg episode reward: [(0, '12.261')] [2024-09-01 15:26:57,382][03021] Saving new best policy, reward=12.261! [2024-09-01 15:27:00,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1961984. Throughput: 0: 235.3. Samples: 492648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:00,150][00194] Avg episode reward: [(0, '11.878')] [2024-09-01 15:27:01,620][03034] Updated weights for policy 0, policy_version 480 (0.1911) [2024-09-01 15:27:05,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1966080. Throughput: 0: 224.6. Samples: 493920. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:05,143][00194] Avg episode reward: [(0, '12.115')] [2024-09-01 15:27:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 888.6). Total num frames: 1970176. Throughput: 0: 216.1. Samples: 494220. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:10,138][00194] Avg episode reward: [(0, '12.094')] [2024-09-01 15:27:15,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 1974272. Throughput: 0: 225.1. Samples: 495788. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:15,144][00194] Avg episode reward: [(0, '12.213')] [2024-09-01 15:27:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1982464. Throughput: 0: 214.5. Samples: 496664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:20,139][00194] Avg episode reward: [(0, '12.239')] [2024-09-01 15:27:25,137][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 1986560. Throughput: 0: 225.0. Samples: 497810. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:25,144][00194] Avg episode reward: [(0, '12.707')] [2024-09-01 15:27:29,334][03021] Saving new best policy, reward=12.707! [2024-09-01 15:27:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1990656. Throughput: 0: 224.4. Samples: 499190. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:27:30,138][00194] Avg episode reward: [(0, '12.989')] [2024-09-01 15:27:33,268][03021] Saving new best policy, reward=12.989! [2024-09-01 15:27:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 1994752. Throughput: 0: 236.7. Samples: 500796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:27:35,142][00194] Avg episode reward: [(0, '13.343')] [2024-09-01 15:27:37,268][03021] Saving new best policy, reward=13.343! [2024-09-01 15:27:40,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 1998848. Throughput: 0: 226.3. Samples: 501308. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:27:40,142][00194] Avg episode reward: [(0, '13.433')] [2024-09-01 15:27:43,086][03021] Saving new best policy, reward=13.433! [2024-09-01 15:27:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2002944. Throughput: 0: 215.4. Samples: 502342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:27:45,139][00194] Avg episode reward: [(0, '13.881')] [2024-09-01 15:27:47,477][03021] Saving new best policy, reward=13.881! [2024-09-01 15:27:47,484][03034] Updated weights for policy 0, policy_version 490 (0.1174) [2024-09-01 15:27:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2007040. Throughput: 0: 227.7. Samples: 504166. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:27:50,141][00194] Avg episode reward: [(0, '14.077')] [2024-09-01 15:27:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2011136. Throughput: 0: 235.4. Samples: 504812. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:27:55,139][00194] Avg episode reward: [(0, '14.071')] [2024-09-01 15:27:55,360][03021] Saving new best policy, reward=14.077! [2024-09-01 15:28:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2015232. Throughput: 0: 229.2. Samples: 506100. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:00,139][00194] Avg episode reward: [(0, '14.053')] [2024-09-01 15:28:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2019328. Throughput: 0: 238.3. Samples: 507386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:05,139][00194] Avg episode reward: [(0, '13.822')] [2024-09-01 15:28:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2027520. Throughput: 0: 231.3. Samples: 508216. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:10,138][00194] Avg episode reward: [(0, '13.810')] [2024-09-01 15:28:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2031616. Throughput: 0: 227.8. Samples: 509440. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:15,139][00194] Avg episode reward: [(0, '14.038')] [2024-09-01 15:28:19,560][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth... [2024-09-01 15:28:19,678][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000444_1818624.pth [2024-09-01 15:28:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2035712. Throughput: 0: 214.7. Samples: 510456. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:20,139][00194] Avg episode reward: [(0, '13.959')] [2024-09-01 15:28:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2039808. Throughput: 0: 226.7. Samples: 511510. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:25,139][00194] Avg episode reward: [(0, '14.159')] [2024-09-01 15:28:27,379][03021] Saving new best policy, reward=14.159! [2024-09-01 15:28:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2043904. Throughput: 0: 237.9. Samples: 513046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:30,145][00194] Avg episode reward: [(0, '13.950')] [2024-09-01 15:28:31,832][03034] Updated weights for policy 0, policy_version 500 (0.0071) [2024-09-01 15:28:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2048000. Throughput: 0: 224.8. Samples: 514282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:35,140][00194] Avg episode reward: [(0, '13.903')] [2024-09-01 15:28:35,815][03021] Signal inference workers to stop experience collection... (500 times) [2024-09-01 15:28:35,865][03034] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-09-01 15:28:37,609][03021] Signal inference workers to resume experience collection... (500 times) [2024-09-01 15:28:37,610][03034] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-09-01 15:28:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2052096. Throughput: 0: 217.3. Samples: 514592. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:40,139][00194] Avg episode reward: [(0, '14.125')] [2024-09-01 15:28:45,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2056192. Throughput: 0: 230.9. Samples: 516492. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:28:45,141][00194] Avg episode reward: [(0, '13.500')] [2024-09-01 15:28:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2064384. Throughput: 0: 228.4. Samples: 517662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:50,143][00194] Avg episode reward: [(0, '13.957')] [2024-09-01 15:28:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2064384. Throughput: 0: 224.3. Samples: 518310. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:28:55,138][00194] Avg episode reward: [(0, '13.801')] [2024-09-01 15:29:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2072576. Throughput: 0: 227.5. Samples: 519678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:29:00,139][00194] Avg episode reward: [(0, '13.794')] [2024-09-01 15:29:05,136][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2076672. Throughput: 0: 241.3. Samples: 521314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:05,141][00194] Avg episode reward: [(0, '13.400')] [2024-09-01 15:29:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2080768. Throughput: 0: 227.2. Samples: 521736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:10,141][00194] Avg episode reward: [(0, '13.401')] [2024-09-01 15:29:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2084864. Throughput: 0: 215.9. Samples: 522762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:15,147][00194] Avg episode reward: [(0, '13.225')] [2024-09-01 15:29:17,868][03034] Updated weights for policy 0, policy_version 510 (0.1507) [2024-09-01 15:29:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2088960. Throughput: 0: 228.1. Samples: 524548. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:20,145][00194] Avg episode reward: [(0, '12.847')] [2024-09-01 15:29:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2093056. Throughput: 0: 240.8. Samples: 525426. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:25,138][00194] Avg episode reward: [(0, '13.575')] [2024-09-01 15:29:30,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2097152. Throughput: 0: 221.0. Samples: 526438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:30,141][00194] Avg episode reward: [(0, '13.734')] [2024-09-01 15:29:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2101248. Throughput: 0: 228.0. Samples: 527920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:35,138][00194] Avg episode reward: [(0, '13.735')] [2024-09-01 15:29:40,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2109440. Throughput: 0: 235.6. Samples: 528910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:40,145][00194] Avg episode reward: [(0, '13.145')] [2024-09-01 15:29:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2113536. Throughput: 0: 227.3. Samples: 529906. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:45,144][00194] Avg episode reward: [(0, '13.358')] [2024-09-01 15:29:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2117632. Throughput: 0: 215.3. Samples: 531004. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:50,148][00194] Avg episode reward: [(0, '13.130')] [2024-09-01 15:29:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2121728. Throughput: 0: 228.3. Samples: 532010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:29:55,141][00194] Avg episode reward: [(0, '12.993')] [2024-09-01 15:30:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2125824. Throughput: 0: 241.3. Samples: 533622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:00,147][00194] Avg episode reward: [(0, '12.821')] [2024-09-01 15:30:02,181][03034] Updated weights for policy 0, policy_version 520 (0.0530) [2024-09-01 15:30:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2129920. Throughput: 0: 223.8. Samples: 534620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:05,139][00194] Avg episode reward: [(0, '12.767')] [2024-09-01 15:30:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2134016. Throughput: 0: 214.4. Samples: 535076. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:10,140][00194] Avg episode reward: [(0, '12.534')] [2024-09-01 15:30:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2138112. Throughput: 0: 233.9. Samples: 536962. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:15,139][00194] Avg episode reward: [(0, '12.672')] [2024-09-01 15:30:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2142208. Throughput: 0: 225.6. Samples: 538074. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:20,140][00194] Avg episode reward: [(0, '12.575')] [2024-09-01 15:30:20,716][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth... [2024-09-01 15:30:20,864][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000470_1925120.pth [2024-09-01 15:30:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2146304. Throughput: 0: 214.1. Samples: 538544. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:25,147][00194] Avg episode reward: [(0, '12.826')] [2024-09-01 15:30:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 2154496. Throughput: 0: 227.7. Samples: 540154. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:30,139][00194] Avg episode reward: [(0, '13.402')] [2024-09-01 15:30:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2158592. Throughput: 0: 236.4. Samples: 541642. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:35,143][00194] Avg episode reward: [(0, '13.861')] [2024-09-01 15:30:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2162688. Throughput: 0: 227.0. Samples: 542226. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:30:40,144][00194] Avg episode reward: [(0, '13.827')] [2024-09-01 15:30:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2166784. Throughput: 0: 214.1. Samples: 543256. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:30:45,138][00194] Avg episode reward: [(0, '14.206')] [2024-09-01 15:30:47,657][03021] Saving new best policy, reward=14.206! [2024-09-01 15:30:47,662][03034] Updated weights for policy 0, policy_version 530 (0.2030) [2024-09-01 15:30:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2170880. Throughput: 0: 232.8. Samples: 545094. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:30:50,145][00194] Avg episode reward: [(0, '14.067')] [2024-09-01 15:30:55,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2174976. Throughput: 0: 235.2. Samples: 545660. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:30:55,148][00194] Avg episode reward: [(0, '14.558')] [2024-09-01 15:31:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2179072. Throughput: 0: 215.9. Samples: 546676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:31:00,139][00194] Avg episode reward: [(0, '14.374')] [2024-09-01 15:31:01,840][03021] Saving new best policy, reward=14.558! [2024-09-01 15:31:05,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2183168. Throughput: 0: 230.2. Samples: 548432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:05,138][00194] Avg episode reward: [(0, '15.056')] [2024-09-01 15:31:09,508][03021] Saving new best policy, reward=15.056! [2024-09-01 15:31:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2191360. Throughput: 0: 236.4. Samples: 549184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:10,144][00194] Avg episode reward: [(0, '15.075')] [2024-09-01 15:31:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2191360. Throughput: 0: 228.0. Samples: 550412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:15,142][00194] Avg episode reward: [(0, '14.909')] [2024-09-01 15:31:15,359][03021] Saving new best policy, reward=15.075! [2024-09-01 15:31:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2199552. Throughput: 0: 218.2. Samples: 551462. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:20,140][00194] Avg episode reward: [(0, '15.050')] [2024-09-01 15:31:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2203648. Throughput: 0: 225.3. Samples: 552364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:25,139][00194] Avg episode reward: [(0, '14.934')] [2024-09-01 15:31:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2207744. Throughput: 0: 229.3. Samples: 553574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:30,138][00194] Avg episode reward: [(0, '15.070')] [2024-09-01 15:31:33,184][03034] Updated weights for policy 0, policy_version 540 (0.1215) [2024-09-01 15:31:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2211840. Throughput: 0: 216.8. Samples: 554852. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:35,139][00194] Avg episode reward: [(0, '15.499')] [2024-09-01 15:31:37,828][03021] Saving new best policy, reward=15.499! [2024-09-01 15:31:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2215936. Throughput: 0: 219.5. Samples: 555536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:40,143][00194] Avg episode reward: [(0, '16.104')] [2024-09-01 15:31:41,799][03021] Saving new best policy, reward=16.104! [2024-09-01 15:31:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2220032. Throughput: 0: 236.4. Samples: 557316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:45,147][00194] Avg episode reward: [(0, '16.763')] [2024-09-01 15:31:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2224128. Throughput: 0: 220.5. Samples: 558356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:50,144][00194] Avg episode reward: [(0, '16.703')] [2024-09-01 15:31:51,298][03021] Saving new best policy, reward=16.763! [2024-09-01 15:31:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2228224. Throughput: 0: 216.8. Samples: 558938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:31:55,138][00194] Avg episode reward: [(0, '16.208')] [2024-09-01 15:32:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2236416. Throughput: 0: 229.5. Samples: 560740. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:00,138][00194] Avg episode reward: [(0, '16.198')] [2024-09-01 15:32:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2240512. Throughput: 0: 234.5. Samples: 562014. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:05,143][00194] Avg episode reward: [(0, '16.078')] [2024-09-01 15:32:10,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 2244608. Throughput: 0: 230.2. Samples: 562722. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:10,146][00194] Avg episode reward: [(0, '16.188')] [2024-09-01 15:32:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2248704. Throughput: 0: 224.6. Samples: 563680. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:15,143][00194] Avg episode reward: [(0, '16.064')] [2024-09-01 15:32:17,536][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth... [2024-09-01 15:32:17,541][03034] Updated weights for policy 0, policy_version 550 (0.0679) [2024-09-01 15:32:17,645][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth [2024-09-01 15:32:19,875][03021] Signal inference workers to stop experience collection... (550 times) [2024-09-01 15:32:19,925][03034] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-09-01 15:32:20,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2252800. Throughput: 0: 238.3. Samples: 565576. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:32:20,147][00194] Avg episode reward: [(0, '15.370')] [2024-09-01 15:32:21,340][03021] Signal inference workers to resume experience collection... (550 times) [2024-09-01 15:32:21,342][03034] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-09-01 15:32:25,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2256896. Throughput: 0: 230.9. Samples: 565930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:25,147][00194] Avg episode reward: [(0, '15.204')] [2024-09-01 15:32:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2260992. Throughput: 0: 218.4. Samples: 567144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:30,140][00194] Avg episode reward: [(0, '14.805')] [2024-09-01 15:32:35,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2265088. Throughput: 0: 233.0. Samples: 568840. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:35,141][00194] Avg episode reward: [(0, '14.463')] [2024-09-01 15:32:40,141][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 2273280. Throughput: 0: 235.0. Samples: 569516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:40,143][00194] Avg episode reward: [(0, '14.651')] [2024-09-01 15:32:45,142][00194] Fps is (10 sec: 1228.0, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 2277376. Throughput: 0: 225.2. Samples: 570874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:45,145][00194] Avg episode reward: [(0, '14.633')] [2024-09-01 15:32:50,138][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2281472. Throughput: 0: 219.1. Samples: 571872. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:50,147][00194] Avg episode reward: [(0, '14.797')] [2024-09-01 15:32:55,136][00194] Fps is (10 sec: 819.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2285568. Throughput: 0: 226.4. Samples: 572908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:32:55,143][00194] Avg episode reward: [(0, '14.419')] [2024-09-01 15:33:00,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2289664. Throughput: 0: 235.5. Samples: 574278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:00,142][00194] Avg episode reward: [(0, '14.241')] [2024-09-01 15:33:03,758][03034] Updated weights for policy 0, policy_version 560 (0.2684) [2024-09-01 15:33:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2293760. Throughput: 0: 213.2. Samples: 575170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:05,143][00194] Avg episode reward: [(0, '13.690')] [2024-09-01 15:33:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2297856. Throughput: 0: 223.6. Samples: 575992. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:10,147][00194] Avg episode reward: [(0, '13.307')] [2024-09-01 15:33:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2301952. Throughput: 0: 240.2. Samples: 577954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:15,145][00194] Avg episode reward: [(0, '13.731')] [2024-09-01 15:33:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2306048. Throughput: 0: 227.3. Samples: 579070. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:20,141][00194] Avg episode reward: [(0, '13.223')] [2024-09-01 15:33:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2310144. Throughput: 0: 221.4. Samples: 579478. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:25,144][00194] Avg episode reward: [(0, '13.440')] [2024-09-01 15:33:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2318336. Throughput: 0: 228.3. Samples: 581144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:30,139][00194] Avg episode reward: [(0, '13.476')] [2024-09-01 15:33:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2322432. Throughput: 0: 237.3. Samples: 582552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:33:35,146][00194] Avg episode reward: [(0, '13.612')] [2024-09-01 15:33:40,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2326528. Throughput: 0: 228.6. Samples: 583194. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:33:40,145][00194] Avg episode reward: [(0, '13.687')] [2024-09-01 15:33:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2330624. Throughput: 0: 222.4. Samples: 584286. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:45,139][00194] Avg episode reward: [(0, '14.422')] [2024-09-01 15:33:47,104][03034] Updated weights for policy 0, policy_version 570 (0.0054) [2024-09-01 15:33:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2334720. Throughput: 0: 246.4. Samples: 586260. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:50,146][00194] Avg episode reward: [(0, '14.781')] [2024-09-01 15:33:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2338816. Throughput: 0: 237.9. Samples: 586696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:33:55,138][00194] Avg episode reward: [(0, '14.698')] [2024-09-01 15:34:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2342912. Throughput: 0: 218.3. Samples: 587778. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:00,147][00194] Avg episode reward: [(0, '15.246')] [2024-09-01 15:34:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2351104. Throughput: 0: 228.6. Samples: 589356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:05,138][00194] Avg episode reward: [(0, '15.118')] [2024-09-01 15:34:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2351104. Throughput: 0: 235.2. Samples: 590064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:10,139][00194] Avg episode reward: [(0, '14.992')] [2024-09-01 15:34:15,139][00194] Fps is (10 sec: 409.5, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2355200. Throughput: 0: 206.6. Samples: 590442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:15,152][00194] Avg episode reward: [(0, '14.992')] [2024-09-01 15:34:20,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2355200. Throughput: 0: 196.9. Samples: 591414. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:20,138][00194] Avg episode reward: [(0, '14.807')] [2024-09-01 15:34:21,310][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth... [2024-09-01 15:34:21,418][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth [2024-09-01 15:34:25,136][00194] Fps is (10 sec: 409.7, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2359296. Throughput: 0: 194.6. Samples: 591950. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:34:25,146][00194] Avg episode reward: [(0, '14.632')] [2024-09-01 15:34:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2367488. Throughput: 0: 203.6. Samples: 593446. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:34:30,138][00194] Avg episode reward: [(0, '14.973')] [2024-09-01 15:34:35,137][00194] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2371584. Throughput: 0: 185.0. Samples: 594584. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:34:35,143][00194] Avg episode reward: [(0, '14.855')] [2024-09-01 15:34:39,882][03034] Updated weights for policy 0, policy_version 580 (0.0564) [2024-09-01 15:34:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2375680. Throughput: 0: 190.8. Samples: 595282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:40,144][00194] Avg episode reward: [(0, '14.892')] [2024-09-01 15:34:45,136][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2379776. Throughput: 0: 196.6. Samples: 596624. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:45,140][00194] Avg episode reward: [(0, '15.959')] [2024-09-01 15:34:50,139][00194] Fps is (10 sec: 818.9, 60 sec: 819.1, 300 sec: 888.6). Total num frames: 2383872. Throughput: 0: 194.1. Samples: 598090. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:50,142][00194] Avg episode reward: [(0, '16.658')] [2024-09-01 15:34:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2387968. Throughput: 0: 189.3. Samples: 598584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:34:55,138][00194] Avg episode reward: [(0, '16.490')] [2024-09-01 15:35:00,136][00194] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2392064. Throughput: 0: 213.8. Samples: 600064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:35:00,146][00194] Avg episode reward: [(0, '16.349')] [2024-09-01 15:35:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2400256. Throughput: 0: 228.1. Samples: 601678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:05,145][00194] Avg episode reward: [(0, '16.273')] [2024-09-01 15:35:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2400256. Throughput: 0: 232.9. Samples: 602432. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:10,139][00194] Avg episode reward: [(0, '15.851')] [2024-09-01 15:35:15,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2404352. Throughput: 0: 218.2. Samples: 603266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:15,145][00194] Avg episode reward: [(0, '16.457')] [2024-09-01 15:35:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2412544. Throughput: 0: 229.7. Samples: 604922. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:20,140][00194] Avg episode reward: [(0, '16.213')] [2024-09-01 15:35:22,868][03034] Updated weights for policy 0, policy_version 590 (0.1754) [2024-09-01 15:35:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2416640. Throughput: 0: 232.8. Samples: 605760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:25,141][00194] Avg episode reward: [(0, '16.573')] [2024-09-01 15:35:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2420736. Throughput: 0: 228.1. Samples: 606888. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:30,139][00194] Avg episode reward: [(0, '16.385')] [2024-09-01 15:35:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2424832. Throughput: 0: 230.5. Samples: 608460. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:35,144][00194] Avg episode reward: [(0, '16.451')] [2024-09-01 15:35:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2428928. Throughput: 0: 235.0. Samples: 609160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:40,145][00194] Avg episode reward: [(0, '16.384')] [2024-09-01 15:35:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2433024. Throughput: 0: 232.8. Samples: 610542. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:45,146][00194] Avg episode reward: [(0, '17.090')] [2024-09-01 15:35:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2437120. Throughput: 0: 224.0. Samples: 611756. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:35:50,149][00194] Avg episode reward: [(0, '17.433')] [2024-09-01 15:35:51,015][03021] Saving new best policy, reward=17.090! [2024-09-01 15:35:54,856][03021] Saving new best policy, reward=17.433! [2024-09-01 15:35:55,142][00194] Fps is (10 sec: 1228.1, 60 sec: 955.6, 300 sec: 902.5). Total num frames: 2445312. Throughput: 0: 224.5. Samples: 612538. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:35:55,151][00194] Avg episode reward: [(0, '17.449')] [2024-09-01 15:35:58,768][03021] Saving new best policy, reward=17.449! [2024-09-01 15:36:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2449408. Throughput: 0: 238.0. Samples: 613974. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:36:00,143][00194] Avg episode reward: [(0, '17.573')] [2024-09-01 15:36:04,424][03021] Saving new best policy, reward=17.573! [2024-09-01 15:36:05,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2453504. Throughput: 0: 225.2. Samples: 615056. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:36:05,138][00194] Avg episode reward: [(0, '17.573')] [2024-09-01 15:36:09,370][03034] Updated weights for policy 0, policy_version 600 (0.0525) [2024-09-01 15:36:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2457600. Throughput: 0: 223.4. Samples: 615814. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:10,140][00194] Avg episode reward: [(0, '17.511')] [2024-09-01 15:36:11,675][03021] Signal inference workers to stop experience collection... (600 times) [2024-09-01 15:36:11,733][03034] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-09-01 15:36:13,146][03021] Signal inference workers to resume experience collection... (600 times) [2024-09-01 15:36:13,149][03034] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-09-01 15:36:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2461696. Throughput: 0: 227.2. Samples: 617114. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:15,146][00194] Avg episode reward: [(0, '17.478')] [2024-09-01 15:36:17,116][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000602_2465792.pth... [2024-09-01 15:36:17,234][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth [2024-09-01 15:36:20,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2465792. Throughput: 0: 226.9. Samples: 618672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:20,141][00194] Avg episode reward: [(0, '17.277')] [2024-09-01 15:36:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2469888. Throughput: 0: 221.2. Samples: 619116. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:25,138][00194] Avg episode reward: [(0, '17.390')] [2024-09-01 15:36:30,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2473984. Throughput: 0: 221.9. Samples: 620528. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:30,152][00194] Avg episode reward: [(0, '16.622')] [2024-09-01 15:36:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2478080. Throughput: 0: 231.1. Samples: 622156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:35,138][00194] Avg episode reward: [(0, '17.234')] [2024-09-01 15:36:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2482176. Throughput: 0: 225.5. Samples: 622684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:40,139][00194] Avg episode reward: [(0, '17.375')] [2024-09-01 15:36:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2486272. Throughput: 0: 216.7. Samples: 623726. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:36:45,149][00194] Avg episode reward: [(0, '17.829')] [2024-09-01 15:36:49,590][03021] Saving new best policy, reward=17.829! [2024-09-01 15:36:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2494464. Throughput: 0: 227.0. Samples: 625272. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:50,138][00194] Avg episode reward: [(0, '17.889')] [2024-09-01 15:36:53,746][03021] Saving new best policy, reward=17.889! [2024-09-01 15:36:53,783][03034] Updated weights for policy 0, policy_version 610 (0.1219) [2024-09-01 15:36:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2498560. Throughput: 0: 232.4. Samples: 626274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:36:55,139][00194] Avg episode reward: [(0, '17.831')] [2024-09-01 15:37:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2502656. Throughput: 0: 226.3. Samples: 627296. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:00,139][00194] Avg episode reward: [(0, '18.390')] [2024-09-01 15:37:03,909][03021] Saving new best policy, reward=18.390! [2024-09-01 15:37:05,137][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2506752. Throughput: 0: 220.3. Samples: 628584. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:05,144][00194] Avg episode reward: [(0, '18.006')] [2024-09-01 15:37:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2510848. Throughput: 0: 227.5. Samples: 629352. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:10,146][00194] Avg episode reward: [(0, '18.789')] [2024-09-01 15:37:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2514944. Throughput: 0: 226.4. Samples: 630718. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:15,145][00194] Avg episode reward: [(0, '18.796')] [2024-09-01 15:37:17,807][03021] Saving new best policy, reward=18.789! [2024-09-01 15:37:17,936][03021] Saving new best policy, reward=18.796! [2024-09-01 15:37:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2519040. Throughput: 0: 216.6. Samples: 631904. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:20,138][00194] Avg episode reward: [(0, '18.491')] [2024-09-01 15:37:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2523136. Throughput: 0: 218.2. Samples: 632502. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:25,138][00194] Avg episode reward: [(0, '18.033')] [2024-09-01 15:37:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2527232. Throughput: 0: 235.9. Samples: 634340. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:30,140][00194] Avg episode reward: [(0, '18.248')] [2024-09-01 15:37:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.8). Total num frames: 2531328. Throughput: 0: 224.7. Samples: 635384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:35,158][00194] Avg episode reward: [(0, '18.016')] [2024-09-01 15:37:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.8). Total num frames: 2535424. Throughput: 0: 214.1. Samples: 635908. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:40,139][00194] Avg episode reward: [(0, '18.389')] [2024-09-01 15:37:41,074][03034] Updated weights for policy 0, policy_version 620 (0.1017) [2024-09-01 15:37:45,137][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2543616. Throughput: 0: 229.6. Samples: 637630. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:37:45,140][00194] Avg episode reward: [(0, '17.935')] [2024-09-01 15:37:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2547712. Throughput: 0: 225.9. Samples: 638748. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:37:50,144][00194] Avg episode reward: [(0, '17.932')] [2024-09-01 15:37:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2551808. Throughput: 0: 223.8. Samples: 639422. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:37:55,140][00194] Avg episode reward: [(0, '17.345')] [2024-09-01 15:38:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2555904. Throughput: 0: 221.8. Samples: 640700. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:00,140][00194] Avg episode reward: [(0, '17.246')] [2024-09-01 15:38:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2560000. Throughput: 0: 237.5. Samples: 642592. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:05,138][00194] Avg episode reward: [(0, '17.003')] [2024-09-01 15:38:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2564096. Throughput: 0: 228.9. Samples: 642804. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:10,139][00194] Avg episode reward: [(0, '16.978')] [2024-09-01 15:38:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2568192. Throughput: 0: 213.9. Samples: 643964. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:38:15,141][00194] Avg episode reward: [(0, '16.547')] [2024-09-01 15:38:19,957][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth... [2024-09-01 15:38:20,074][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth [2024-09-01 15:38:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2576384. Throughput: 0: 228.1. Samples: 645646. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:20,144][00194] Avg episode reward: [(0, '16.768')] [2024-09-01 15:38:24,865][03034] Updated weights for policy 0, policy_version 630 (0.1022) [2024-09-01 15:38:25,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2580480. Throughput: 0: 237.9. Samples: 646612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:25,146][00194] Avg episode reward: [(0, '16.866')] [2024-09-01 15:38:30,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2580480. Throughput: 0: 221.6. Samples: 647600. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:30,146][00194] Avg episode reward: [(0, '16.842')] [2024-09-01 15:38:35,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.8, 300 sec: 888.6). Total num frames: 2588672. Throughput: 0: 227.3. Samples: 648976. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:38:35,143][00194] Avg episode reward: [(0, '16.657')] [2024-09-01 15:38:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 2592768. Throughput: 0: 229.5. Samples: 649750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:38:40,143][00194] Avg episode reward: [(0, '16.729')] [2024-09-01 15:38:45,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2596864. Throughput: 0: 228.4. Samples: 650980. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:45,150][00194] Avg episode reward: [(0, '16.740')] [2024-09-01 15:38:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2600960. Throughput: 0: 217.1. Samples: 652362. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:38:50,139][00194] Avg episode reward: [(0, '17.368')] [2024-09-01 15:38:55,136][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2605056. Throughput: 0: 228.0. Samples: 653062. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:38:55,143][00194] Avg episode reward: [(0, '17.468')] [2024-09-01 15:39:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 2609152. Throughput: 0: 240.8. Samples: 654800. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:00,141][00194] Avg episode reward: [(0, '17.162')] [2024-09-01 15:39:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2613248. Throughput: 0: 224.4. Samples: 655744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:05,142][00194] Avg episode reward: [(0, '17.014')] [2024-09-01 15:39:09,782][03034] Updated weights for policy 0, policy_version 640 (0.0048) [2024-09-01 15:39:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2621440. Throughput: 0: 219.4. Samples: 656484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:10,141][00194] Avg episode reward: [(0, '17.488')] [2024-09-01 15:39:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2625536. Throughput: 0: 228.6. Samples: 657888. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:15,141][00194] Avg episode reward: [(0, '17.710')] [2024-09-01 15:39:20,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 2625536. Throughput: 0: 220.0. Samples: 658876. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:20,141][00194] Avg episode reward: [(0, '18.127')] [2024-09-01 15:39:25,136][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2629632. Throughput: 0: 213.5. Samples: 659356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:39:25,145][00194] Avg episode reward: [(0, '18.380')] [2024-09-01 15:39:30,136][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2637824. Throughput: 0: 220.2. Samples: 660886. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:30,145][00194] Avg episode reward: [(0, '18.273')] [2024-09-01 15:39:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2641920. Throughput: 0: 222.2. Samples: 662362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:35,144][00194] Avg episode reward: [(0, '17.777')] [2024-09-01 15:39:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2646016. Throughput: 0: 219.1. Samples: 662922. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:40,139][00194] Avg episode reward: [(0, '18.493')] [2024-09-01 15:39:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 2650112. Throughput: 0: 204.6. Samples: 664006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:45,139][00194] Avg episode reward: [(0, '18.301')] [2024-09-01 15:39:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2654208. Throughput: 0: 228.3. Samples: 666016. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:50,146][00194] Avg episode reward: [(0, '18.586')] [2024-09-01 15:39:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2658304. Throughput: 0: 220.9. Samples: 666426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:39:55,142][00194] Avg episode reward: [(0, '19.128')] [2024-09-01 15:39:56,763][03021] Saving new best policy, reward=19.128! [2024-09-01 15:39:56,753][03034] Updated weights for policy 0, policy_version 650 (0.1637) [2024-09-01 15:40:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2662400. Throughput: 0: 215.0. Samples: 667564. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:00,138][00194] Avg episode reward: [(0, '19.077')] [2024-09-01 15:40:00,191][03021] Signal inference workers to stop experience collection... (650 times) [2024-09-01 15:40:00,249][03034] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-09-01 15:40:01,517][03021] Signal inference workers to resume experience collection... (650 times) [2024-09-01 15:40:01,519][03034] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-09-01 15:40:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2666496. Throughput: 0: 228.1. Samples: 669140. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:05,138][00194] Avg episode reward: [(0, '18.654')] [2024-09-01 15:40:10,136][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 2674688. Throughput: 0: 237.8. Samples: 670056. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:10,143][00194] Avg episode reward: [(0, '18.396')] [2024-09-01 15:40:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2678784. Throughput: 0: 225.3. Samples: 671024. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:15,141][00194] Avg episode reward: [(0, '17.993')] [2024-09-01 15:40:19,739][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000655_2682880.pth... [2024-09-01 15:40:19,847][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000602_2465792.pth [2024-09-01 15:40:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2682880. Throughput: 0: 218.4. Samples: 672188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:20,143][00194] Avg episode reward: [(0, '17.373')] [2024-09-01 15:40:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2686976. Throughput: 0: 228.8. Samples: 673218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:25,138][00194] Avg episode reward: [(0, '17.249')] [2024-09-01 15:40:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2691072. Throughput: 0: 235.5. Samples: 674604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:30,145][00194] Avg episode reward: [(0, '17.045')] [2024-09-01 15:40:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2695168. Throughput: 0: 214.0. Samples: 675644. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:35,142][00194] Avg episode reward: [(0, '17.239')] [2024-09-01 15:40:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2699264. Throughput: 0: 219.5. Samples: 676302. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:40,142][00194] Avg episode reward: [(0, '17.996')] [2024-09-01 15:40:42,042][03034] Updated weights for policy 0, policy_version 660 (0.2248) [2024-09-01 15:40:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2703360. Throughput: 0: 232.5. Samples: 678028. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:45,145][00194] Avg episode reward: [(0, '17.965')] [2024-09-01 15:40:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2707456. Throughput: 0: 224.2. Samples: 679230. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:40:50,142][00194] Avg episode reward: [(0, '17.388')] [2024-09-01 15:40:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2711552. Throughput: 0: 215.1. Samples: 679734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:40:55,141][00194] Avg episode reward: [(0, '17.932')] [2024-09-01 15:41:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2715648. Throughput: 0: 226.7. Samples: 681224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:41:00,148][00194] Avg episode reward: [(0, '17.388')] [2024-09-01 15:41:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2719744. Throughput: 0: 228.8. Samples: 682486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:05,138][00194] Avg episode reward: [(0, '17.630')] [2024-09-01 15:41:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2723840. Throughput: 0: 215.4. Samples: 682910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:10,143][00194] Avg episode reward: [(0, '17.784')] [2024-09-01 15:41:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 2727936. Throughput: 0: 211.7. Samples: 684130. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:15,145][00194] Avg episode reward: [(0, '18.055')] [2024-09-01 15:41:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2736128. Throughput: 0: 222.1. Samples: 685640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:20,147][00194] Avg episode reward: [(0, '18.308')] [2024-09-01 15:41:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2740224. Throughput: 0: 228.4. Samples: 686580. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:25,142][00194] Avg episode reward: [(0, '18.215')] [2024-09-01 15:41:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2744320. Throughput: 0: 212.9. Samples: 687610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:30,146][00194] Avg episode reward: [(0, '17.663')] [2024-09-01 15:41:30,168][03034] Updated weights for policy 0, policy_version 670 (0.2184) [2024-09-01 15:41:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2748416. Throughput: 0: 219.0. Samples: 689086. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:35,144][00194] Avg episode reward: [(0, '17.081')] [2024-09-01 15:41:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2752512. Throughput: 0: 222.7. Samples: 689756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:40,138][00194] Avg episode reward: [(0, '17.206')] [2024-09-01 15:41:45,136][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2756608. Throughput: 0: 217.5. Samples: 691010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:45,145][00194] Avg episode reward: [(0, '17.206')] [2024-09-01 15:41:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2760704. Throughput: 0: 217.5. Samples: 692274. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:50,146][00194] Avg episode reward: [(0, '17.321')] [2024-09-01 15:41:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2764800. Throughput: 0: 227.1. Samples: 693128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:41:55,141][00194] Avg episode reward: [(0, '17.597')] [2024-09-01 15:42:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2768896. Throughput: 0: 235.4. Samples: 694724. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:00,144][00194] Avg episode reward: [(0, '17.215')] [2024-09-01 15:42:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2772992. Throughput: 0: 224.6. Samples: 695746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:05,138][00194] Avg episode reward: [(0, '17.520')] [2024-09-01 15:42:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2781184. Throughput: 0: 217.6. Samples: 696372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:42:10,142][00194] Avg episode reward: [(0, '17.605')] [2024-09-01 15:42:13,787][03034] Updated weights for policy 0, policy_version 680 (0.0583) [2024-09-01 15:42:15,139][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2785280. Throughput: 0: 225.9. Samples: 697778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:42:15,141][00194] Avg episode reward: [(0, '16.796')] [2024-09-01 15:42:19,309][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000681_2789376.pth... [2024-09-01 15:42:19,422][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth [2024-09-01 15:42:20,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2789376. Throughput: 0: 217.9. Samples: 698890. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:42:20,141][00194] Avg episode reward: [(0, '16.508')] [2024-09-01 15:42:25,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2793472. Throughput: 0: 219.1. Samples: 699614. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:25,144][00194] Avg episode reward: [(0, '17.590')] [2024-09-01 15:42:30,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2797568. Throughput: 0: 222.2. Samples: 701008. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:42:30,143][00194] Avg episode reward: [(0, '16.690')] [2024-09-01 15:42:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2801664. Throughput: 0: 230.4. Samples: 702642. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:35,139][00194] Avg episode reward: [(0, '16.547')] [2024-09-01 15:42:40,140][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 2805760. Throughput: 0: 217.7. Samples: 702926. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:40,146][00194] Avg episode reward: [(0, '16.577')] [2024-09-01 15:42:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2809856. Throughput: 0: 216.4. Samples: 704460. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:45,138][00194] Avg episode reward: [(0, '16.622')] [2024-09-01 15:42:50,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2813952. Throughput: 0: 231.8. Samples: 706178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:50,147][00194] Avg episode reward: [(0, '16.403')] [2024-09-01 15:42:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2818048. Throughput: 0: 228.3. Samples: 706644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:42:55,139][00194] Avg episode reward: [(0, '16.105')] [2024-09-01 15:43:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2822144. Throughput: 0: 222.1. Samples: 707770. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:43:00,144][00194] Avg episode reward: [(0, '16.257')] [2024-09-01 15:43:01,308][03034] Updated weights for policy 0, policy_version 690 (0.0534) [2024-09-01 15:43:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2830336. Throughput: 0: 231.7. Samples: 709314. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:43:05,139][00194] Avg episode reward: [(0, '16.304')] [2024-09-01 15:43:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2834432. Throughput: 0: 238.5. Samples: 710348. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:43:10,143][00194] Avg episode reward: [(0, '16.727')] [2024-09-01 15:43:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2838528. Throughput: 0: 231.1. Samples: 711408. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:43:15,140][00194] Avg episode reward: [(0, '17.210')] [2024-09-01 15:43:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2842624. Throughput: 0: 219.7. Samples: 712528. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:20,138][00194] Avg episode reward: [(0, '18.390')] [2024-09-01 15:43:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2846720. Throughput: 0: 233.8. Samples: 713446. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:25,139][00194] Avg episode reward: [(0, '18.378')] [2024-09-01 15:43:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2850816. Throughput: 0: 234.7. Samples: 715020. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:30,145][00194] Avg episode reward: [(0, '18.375')] [2024-09-01 15:43:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2854912. Throughput: 0: 220.0. Samples: 716076. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:35,139][00194] Avg episode reward: [(0, '18.811')] [2024-09-01 15:43:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2859008. Throughput: 0: 222.2. Samples: 716644. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:40,147][00194] Avg episode reward: [(0, '18.913')] [2024-09-01 15:43:44,829][03034] Updated weights for policy 0, policy_version 700 (0.2145) [2024-09-01 15:43:45,138][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2867200. Throughput: 0: 240.0. Samples: 718572. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:45,146][00194] Avg episode reward: [(0, '19.036')] [2024-09-01 15:43:48,737][03021] Signal inference workers to stop experience collection... (700 times) [2024-09-01 15:43:48,830][03034] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-09-01 15:43:49,947][03021] Signal inference workers to resume experience collection... (700 times) [2024-09-01 15:43:49,948][03034] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-09-01 15:43:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2871296. Throughput: 0: 228.0. Samples: 719576. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:50,139][00194] Avg episode reward: [(0, '19.069')] [2024-09-01 15:43:55,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2875392. Throughput: 0: 218.9. Samples: 720200. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:43:55,139][00194] Avg episode reward: [(0, '19.063')] [2024-09-01 15:44:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2879488. Throughput: 0: 226.7. Samples: 721610. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:00,139][00194] Avg episode reward: [(0, '20.121')] [2024-09-01 15:44:02,307][03021] Saving new best policy, reward=20.121! [2024-09-01 15:44:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2883584. Throughput: 0: 239.2. Samples: 723294. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:05,142][00194] Avg episode reward: [(0, '20.386')] [2024-09-01 15:44:07,824][03021] Saving new best policy, reward=20.386! [2024-09-01 15:44:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2887680. Throughput: 0: 225.6. Samples: 723600. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:10,144][00194] Avg episode reward: [(0, '20.153')] [2024-09-01 15:44:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2891776. Throughput: 0: 217.6. Samples: 724810. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:15,139][00194] Avg episode reward: [(0, '20.230')] [2024-09-01 15:44:16,410][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth... [2024-09-01 15:44:16,520][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000655_2682880.pth [2024-09-01 15:44:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2895872. Throughput: 0: 236.0. Samples: 726698. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:20,140][00194] Avg episode reward: [(0, '20.219')] [2024-09-01 15:44:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2899968. Throughput: 0: 232.0. Samples: 727084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:44:25,142][00194] Avg episode reward: [(0, '19.611')] [2024-09-01 15:44:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2904064. Throughput: 0: 217.2. Samples: 728346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:44:30,149][00194] Avg episode reward: [(0, '20.497')] [2024-09-01 15:44:30,732][03034] Updated weights for policy 0, policy_version 710 (0.1086) [2024-09-01 15:44:34,562][03021] Saving new best policy, reward=20.497! [2024-09-01 15:44:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2912256. Throughput: 0: 226.9. Samples: 729788. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:35,143][00194] Avg episode reward: [(0, '20.699')] [2024-09-01 15:44:38,448][03021] Saving new best policy, reward=20.699! [2024-09-01 15:44:40,140][00194] Fps is (10 sec: 1228.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2916352. Throughput: 0: 235.6. Samples: 730804. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:40,146][00194] Avg episode reward: [(0, '20.725')] [2024-09-01 15:44:44,411][03021] Saving new best policy, reward=20.725! [2024-09-01 15:44:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2920448. Throughput: 0: 228.4. Samples: 731888. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 15:44:45,141][00194] Avg episode reward: [(0, '20.492')] [2024-09-01 15:44:50,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2924544. Throughput: 0: 219.4. Samples: 733168. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:50,139][00194] Avg episode reward: [(0, '20.748')] [2024-09-01 15:44:52,926][03021] Saving new best policy, reward=20.748! [2024-09-01 15:44:55,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2928640. Throughput: 0: 229.1. Samples: 733912. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:44:55,150][00194] Avg episode reward: [(0, '21.064')] [2024-09-01 15:44:57,026][03021] Saving new best policy, reward=21.064! [2024-09-01 15:45:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2932736. Throughput: 0: 233.4. Samples: 735312. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:00,142][00194] Avg episode reward: [(0, '20.611')] [2024-09-01 15:45:05,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2936832. Throughput: 0: 216.8. Samples: 736456. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:05,140][00194] Avg episode reward: [(0, '20.114')] [2024-09-01 15:45:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2940928. Throughput: 0: 223.9. Samples: 737160. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:10,141][00194] Avg episode reward: [(0, '19.898')] [2024-09-01 15:45:14,918][03034] Updated weights for policy 0, policy_version 720 (0.0056) [2024-09-01 15:45:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2949120. Throughput: 0: 237.5. Samples: 739032. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:15,139][00194] Avg episode reward: [(0, '20.473')] [2024-09-01 15:45:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 2949120. Throughput: 0: 228.0. Samples: 740048. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:20,139][00194] Avg episode reward: [(0, '20.979')] [2024-09-01 15:45:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2957312. Throughput: 0: 217.4. Samples: 740584. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:25,140][00194] Avg episode reward: [(0, '20.871')] [2024-09-01 15:45:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 2961408. Throughput: 0: 226.6. Samples: 742084. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:30,138][00194] Avg episode reward: [(0, '21.252')] [2024-09-01 15:45:32,582][03021] Saving new best policy, reward=21.252! [2024-09-01 15:45:35,139][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 2965504. Throughput: 0: 230.3. Samples: 743532. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:35,143][00194] Avg episode reward: [(0, '21.427')] [2024-09-01 15:45:38,651][03021] Saving new best policy, reward=21.427! [2024-09-01 15:45:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2969600. Throughput: 0: 225.8. Samples: 744070. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 15:45:40,138][00194] Avg episode reward: [(0, '21.569')] [2024-09-01 15:45:42,848][03021] Saving new best policy, reward=21.569! [2024-09-01 15:45:45,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2973696. Throughput: 0: 223.2. Samples: 745354. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:45:45,144][00194] Avg episode reward: [(0, '21.570')] [2024-09-01 15:45:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2977792. Throughput: 0: 236.5. Samples: 747098. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:45:50,142][00194] Avg episode reward: [(0, '21.226')] [2024-09-01 15:45:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2981888. Throughput: 0: 230.2. Samples: 747520. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:45:55,142][00194] Avg episode reward: [(0, '21.167')] [2024-09-01 15:46:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 2985984. Throughput: 0: 214.4. Samples: 748678. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:00,144][00194] Avg episode reward: [(0, '21.265')] [2024-09-01 15:46:00,792][03034] Updated weights for policy 0, policy_version 730 (0.0059) [2024-09-01 15:46:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2994176. Throughput: 0: 225.2. Samples: 750184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:46:05,148][00194] Avg episode reward: [(0, '21.036')] [2024-09-01 15:46:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 2998272. Throughput: 0: 235.3. Samples: 751172. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:10,143][00194] Avg episode reward: [(0, '21.323')] [2024-09-01 15:46:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3002368. Throughput: 0: 223.9. Samples: 752158. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:15,138][00194] Avg episode reward: [(0, '22.063')] [2024-09-01 15:46:18,718][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth... [2024-09-01 15:46:18,823][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000681_2789376.pth [2024-09-01 15:46:18,834][03021] Saving new best policy, reward=22.063! [2024-09-01 15:46:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 3006464. Throughput: 0: 224.1. Samples: 753616. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:20,145][00194] Avg episode reward: [(0, '21.768')] [2024-09-01 15:46:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3010560. Throughput: 0: 226.8. Samples: 754274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:25,142][00194] Avg episode reward: [(0, '21.474')] [2024-09-01 15:46:30,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3014656. Throughput: 0: 231.1. Samples: 755754. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:30,143][00194] Avg episode reward: [(0, '21.770')] [2024-09-01 15:46:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3018752. Throughput: 0: 218.1. Samples: 756912. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:35,138][00194] Avg episode reward: [(0, '21.529')] [2024-09-01 15:46:40,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3022848. Throughput: 0: 225.1. Samples: 757648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:46:40,147][00194] Avg episode reward: [(0, '22.132')] [2024-09-01 15:46:44,223][03021] Saving new best policy, reward=22.132! [2024-09-01 15:46:44,235][03034] Updated weights for policy 0, policy_version 740 (0.1655) [2024-09-01 15:46:45,144][00194] Fps is (10 sec: 1227.7, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 3031040. Throughput: 0: 238.3. Samples: 759402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:46:45,149][00194] Avg episode reward: [(0, '21.245')] [2024-09-01 15:46:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3035136. Throughput: 0: 226.7. Samples: 760386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:46:50,138][00194] Avg episode reward: [(0, '21.317')] [2024-09-01 15:46:55,136][00194] Fps is (10 sec: 819.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3039232. Throughput: 0: 215.9. Samples: 760886. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:46:55,139][00194] Avg episode reward: [(0, '21.480')] [2024-09-01 15:47:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3043328. Throughput: 0: 229.9. Samples: 762502. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:00,148][00194] Avg episode reward: [(0, '20.783')] [2024-09-01 15:47:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3047424. Throughput: 0: 229.4. Samples: 763938. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:05,140][00194] Avg episode reward: [(0, '20.889')] [2024-09-01 15:47:10,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3051520. Throughput: 0: 224.8. Samples: 764392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:10,145][00194] Avg episode reward: [(0, '20.672')] [2024-09-01 15:47:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3055616. Throughput: 0: 226.2. Samples: 765932. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:47:15,144][00194] Avg episode reward: [(0, '19.423')] [2024-09-01 15:47:20,136][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3063808. Throughput: 0: 234.4. Samples: 767462. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:20,138][00194] Avg episode reward: [(0, '19.466')] [2024-09-01 15:47:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3063808. Throughput: 0: 233.9. Samples: 768174. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:25,138][00194] Avg episode reward: [(0, '19.382')] [2024-09-01 15:47:30,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3067904. Throughput: 0: 218.4. Samples: 769230. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:30,139][00194] Avg episode reward: [(0, '19.509')] [2024-09-01 15:47:30,957][03034] Updated weights for policy 0, policy_version 750 (0.2263) [2024-09-01 15:47:33,253][03021] Signal inference workers to stop experience collection... (750 times) [2024-09-01 15:47:33,307][03034] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-09-01 15:47:34,199][03021] Signal inference workers to resume experience collection... (750 times) [2024-09-01 15:47:34,201][03034] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-09-01 15:47:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3076096. Throughput: 0: 226.4. Samples: 770576. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:47:35,138][00194] Avg episode reward: [(0, '19.567')] [2024-09-01 15:47:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3080192. Throughput: 0: 236.5. Samples: 771530. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:47:40,146][00194] Avg episode reward: [(0, '19.152')] [2024-09-01 15:47:45,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 3084288. Throughput: 0: 225.1. Samples: 772634. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:47:45,140][00194] Avg episode reward: [(0, '19.631')] [2024-09-01 15:47:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3088384. Throughput: 0: 223.1. Samples: 773978. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:50,143][00194] Avg episode reward: [(0, '19.936')] [2024-09-01 15:47:55,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3092480. Throughput: 0: 231.4. Samples: 774804. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:47:55,147][00194] Avg episode reward: [(0, '19.917')] [2024-09-01 15:48:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3096576. Throughput: 0: 232.5. Samples: 776396. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:00,138][00194] Avg episode reward: [(0, '19.718')] [2024-09-01 15:48:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3100672. Throughput: 0: 222.0. Samples: 777450. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:05,139][00194] Avg episode reward: [(0, '19.588')] [2024-09-01 15:48:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3108864. Throughput: 0: 223.5. Samples: 778230. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:10,139][00194] Avg episode reward: [(0, '19.439')] [2024-09-01 15:48:13,726][03034] Updated weights for policy 0, policy_version 760 (0.1012) [2024-09-01 15:48:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3112960. Throughput: 0: 233.4. Samples: 779732. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:15,138][00194] Avg episode reward: [(0, '19.651')] [2024-09-01 15:48:18,866][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000761_3117056.pth... [2024-09-01 15:48:18,957][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth [2024-09-01 15:48:20,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3117056. Throughput: 0: 228.6. Samples: 780862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:20,148][00194] Avg episode reward: [(0, '19.440')] [2024-09-01 15:48:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3121152. Throughput: 0: 223.7. Samples: 781596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:25,139][00194] Avg episode reward: [(0, '19.644')] [2024-09-01 15:48:30,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3125248. Throughput: 0: 232.1. Samples: 783078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:30,138][00194] Avg episode reward: [(0, '20.063')] [2024-09-01 15:48:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3129344. Throughput: 0: 232.0. Samples: 784416. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:35,145][00194] Avg episode reward: [(0, '20.287')] [2024-09-01 15:48:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3133440. Throughput: 0: 226.0. Samples: 784972. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:48:40,140][00194] Avg episode reward: [(0, '19.418')] [2024-09-01 15:48:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3137536. Throughput: 0: 224.2. Samples: 786486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:48:45,142][00194] Avg episode reward: [(0, '19.444')] [2024-09-01 15:48:50,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3145728. Throughput: 0: 232.2. Samples: 787900. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:50,140][00194] Avg episode reward: [(0, '19.318')] [2024-09-01 15:48:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3145728. Throughput: 0: 229.4. Samples: 788552. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:48:55,140][00194] Avg episode reward: [(0, '19.341')] [2024-09-01 15:49:00,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3149824. Throughput: 0: 222.4. Samples: 789742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:49:00,148][00194] Avg episode reward: [(0, '19.888')] [2024-09-01 15:49:00,843][03034] Updated weights for policy 0, policy_version 770 (0.2036) [2024-09-01 15:49:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3158016. Throughput: 0: 225.8. Samples: 791024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:05,138][00194] Avg episode reward: [(0, '20.481')] [2024-09-01 15:49:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3162112. Throughput: 0: 230.9. Samples: 791988. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:10,139][00194] Avg episode reward: [(0, '21.718')] [2024-09-01 15:49:15,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3166208. Throughput: 0: 222.5. Samples: 793090. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:15,144][00194] Avg episode reward: [(0, '22.134')] [2024-09-01 15:49:18,338][03021] Saving new best policy, reward=22.134! [2024-09-01 15:49:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3170304. Throughput: 0: 225.4. Samples: 794558. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:20,139][00194] Avg episode reward: [(0, '21.760')] [2024-09-01 15:49:25,137][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3174400. Throughput: 0: 228.4. Samples: 795248. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:25,139][00194] Avg episode reward: [(0, '22.142')] [2024-09-01 15:49:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3178496. Throughput: 0: 229.4. Samples: 796810. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:30,142][00194] Avg episode reward: [(0, '22.131')] [2024-09-01 15:49:31,648][03021] Saving new best policy, reward=22.142! [2024-09-01 15:49:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3182592. Throughput: 0: 221.9. Samples: 797884. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 15:49:35,139][00194] Avg episode reward: [(0, '22.599')] [2024-09-01 15:49:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3186688. Throughput: 0: 224.4. Samples: 798648. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:49:40,147][00194] Avg episode reward: [(0, '22.329')] [2024-09-01 15:49:40,183][03021] Saving new best policy, reward=22.599! [2024-09-01 15:49:44,449][03034] Updated weights for policy 0, policy_version 780 (0.1073) [2024-09-01 15:49:45,138][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3194880. Throughput: 0: 231.7. Samples: 800170. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:45,141][00194] Avg episode reward: [(0, '22.209')] [2024-09-01 15:49:50,136][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3198976. Throughput: 0: 224.9. Samples: 801146. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:50,147][00194] Avg episode reward: [(0, '22.078')] [2024-09-01 15:49:55,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3203072. Throughput: 0: 219.7. Samples: 801876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:49:55,138][00194] Avg episode reward: [(0, '21.512')] [2024-09-01 15:50:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3207168. Throughput: 0: 226.1. Samples: 803262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:00,146][00194] Avg episode reward: [(0, '21.933')] [2024-09-01 15:50:05,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3211264. Throughput: 0: 227.3. Samples: 804786. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:05,144][00194] Avg episode reward: [(0, '22.340')] [2024-09-01 15:50:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3215360. Throughput: 0: 221.6. Samples: 805222. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:10,140][00194] Avg episode reward: [(0, '22.113')] [2024-09-01 15:50:15,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3219456. Throughput: 0: 214.0. Samples: 806438. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:15,144][00194] Avg episode reward: [(0, '22.157')] [2024-09-01 15:50:16,523][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000787_3223552.pth... [2024-09-01 15:50:16,635][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth [2024-09-01 15:50:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3223552. Throughput: 0: 232.2. Samples: 808334. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:20,145][00194] Avg episode reward: [(0, '22.172')] [2024-09-01 15:50:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3227648. Throughput: 0: 223.5. Samples: 808706. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:25,143][00194] Avg episode reward: [(0, '22.708')] [2024-09-01 15:50:26,192][03021] Saving new best policy, reward=22.708! [2024-09-01 15:50:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3231744. Throughput: 0: 217.7. Samples: 809964. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:30,148][00194] Avg episode reward: [(0, '22.729')] [2024-09-01 15:50:31,092][03034] Updated weights for policy 0, policy_version 790 (0.1658) [2024-09-01 15:50:34,963][03021] Saving new best policy, reward=22.729! [2024-09-01 15:50:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3239936. Throughput: 0: 230.4. Samples: 811516. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:35,140][00194] Avg episode reward: [(0, '22.713')] [2024-09-01 15:50:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3244032. Throughput: 0: 236.8. Samples: 812534. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:40,142][00194] Avg episode reward: [(0, '22.768')] [2024-09-01 15:50:44,838][03021] Saving new best policy, reward=22.768! [2024-09-01 15:50:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3248128. Throughput: 0: 227.6. Samples: 813502. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:50:45,142][00194] Avg episode reward: [(0, '22.931')] [2024-09-01 15:50:49,345][03021] Saving new best policy, reward=22.931! [2024-09-01 15:50:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3252224. Throughput: 0: 219.9. Samples: 814680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:50:50,145][00194] Avg episode reward: [(0, '22.894')] [2024-09-01 15:50:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3256320. Throughput: 0: 230.9. Samples: 815614. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:50:55,138][00194] Avg episode reward: [(0, '22.140')] [2024-09-01 15:51:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3260416. Throughput: 0: 233.3. Samples: 816938. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:00,144][00194] Avg episode reward: [(0, '21.306')] [2024-09-01 15:51:05,140][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3264512. Throughput: 0: 215.8. Samples: 818048. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:05,142][00194] Avg episode reward: [(0, '21.112')] [2024-09-01 15:51:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3268608. Throughput: 0: 223.4. Samples: 818760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:10,148][00194] Avg episode reward: [(0, '21.764')] [2024-09-01 15:51:15,040][03034] Updated weights for policy 0, policy_version 800 (0.0701) [2024-09-01 15:51:15,136][00194] Fps is (10 sec: 1229.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3276800. Throughput: 0: 239.1. Samples: 820724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:51:15,140][00194] Avg episode reward: [(0, '21.618')] [2024-09-01 15:51:18,744][03021] Signal inference workers to stop experience collection... (800 times) [2024-09-01 15:51:18,854][03034] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-09-01 15:51:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3276800. Throughput: 0: 225.5. Samples: 821662. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:51:20,139][00194] Avg episode reward: [(0, '20.990')] [2024-09-01 15:51:20,683][03021] Signal inference workers to resume experience collection... (800 times) [2024-09-01 15:51:20,684][03034] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-09-01 15:51:25,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3280896. Throughput: 0: 213.0. Samples: 822118. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:51:25,151][00194] Avg episode reward: [(0, '20.894')] [2024-09-01 15:51:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3289088. Throughput: 0: 227.7. Samples: 823750. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:30,138][00194] Avg episode reward: [(0, '21.078')] [2024-09-01 15:51:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3293184. Throughput: 0: 233.4. Samples: 825184. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:51:35,143][00194] Avg episode reward: [(0, '21.050')] [2024-09-01 15:51:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3297280. Throughput: 0: 225.6. Samples: 825768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 15:51:40,139][00194] Avg episode reward: [(0, '21.044')] [2024-09-01 15:51:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3301376. Throughput: 0: 221.7. Samples: 826916. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:45,139][00194] Avg episode reward: [(0, '21.593')] [2024-09-01 15:51:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3305472. Throughput: 0: 242.6. Samples: 828966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:50,145][00194] Avg episode reward: [(0, '21.868')] [2024-09-01 15:51:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3309568. Throughput: 0: 235.1. Samples: 829338. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:51:55,141][00194] Avg episode reward: [(0, '21.348')] [2024-09-01 15:52:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3313664. Throughput: 0: 215.9. Samples: 830438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:52:00,142][00194] Avg episode reward: [(0, '21.471')] [2024-09-01 15:52:01,626][03034] Updated weights for policy 0, policy_version 810 (0.1549) [2024-09-01 15:52:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3321856. Throughput: 0: 230.6. Samples: 832038. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:05,139][00194] Avg episode reward: [(0, '21.186')] [2024-09-01 15:52:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3325952. Throughput: 0: 243.1. Samples: 833058. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:10,144][00194] Avg episode reward: [(0, '21.748')] [2024-09-01 15:52:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3330048. Throughput: 0: 228.8. Samples: 834046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:15,142][00194] Avg episode reward: [(0, '21.807')] [2024-09-01 15:52:19,027][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000814_3334144.pth... [2024-09-01 15:52:19,134][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000761_3117056.pth [2024-09-01 15:52:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3334144. Throughput: 0: 226.2. Samples: 835364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:20,140][00194] Avg episode reward: [(0, '21.584')] [2024-09-01 15:52:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3338240. Throughput: 0: 231.3. Samples: 836176. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:25,150][00194] Avg episode reward: [(0, '20.947')] [2024-09-01 15:52:30,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3342336. Throughput: 0: 235.4. Samples: 837508. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:52:30,142][00194] Avg episode reward: [(0, '21.145')] [2024-09-01 15:52:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3346432. Throughput: 0: 216.8. Samples: 838722. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:52:35,138][00194] Avg episode reward: [(0, '21.075')] [2024-09-01 15:52:40,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3350528. Throughput: 0: 224.1. Samples: 839422. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:52:40,147][00194] Avg episode reward: [(0, '20.888')] [2024-09-01 15:52:44,980][03034] Updated weights for policy 0, policy_version 820 (0.1016) [2024-09-01 15:52:45,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3358720. Throughput: 0: 238.8. Samples: 841186. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:45,139][00194] Avg episode reward: [(0, '21.004')] [2024-09-01 15:52:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3358720. Throughput: 0: 226.3. Samples: 842220. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:50,139][00194] Avg episode reward: [(0, '20.840')] [2024-09-01 15:52:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3366912. Throughput: 0: 217.8. Samples: 842858. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:52:55,141][00194] Avg episode reward: [(0, '20.895')] [2024-09-01 15:53:00,136][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3371008. Throughput: 0: 228.1. Samples: 844312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:00,139][00194] Avg episode reward: [(0, '21.330')] [2024-09-01 15:53:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3375104. Throughput: 0: 228.2. Samples: 845632. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:05,143][00194] Avg episode reward: [(0, '21.007')] [2024-09-01 15:53:10,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3379200. Throughput: 0: 224.8. Samples: 846290. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:10,139][00194] Avg episode reward: [(0, '20.868')] [2024-09-01 15:53:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3383296. Throughput: 0: 224.8. Samples: 847624. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:15,138][00194] Avg episode reward: [(0, '21.265')] [2024-09-01 15:53:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3387392. Throughput: 0: 235.4. Samples: 849314. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:20,140][00194] Avg episode reward: [(0, '20.929')] [2024-09-01 15:53:25,142][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3391488. Throughput: 0: 229.9. Samples: 849768. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:25,145][00194] Avg episode reward: [(0, '21.351')] [2024-09-01 15:53:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3395584. Throughput: 0: 216.4. Samples: 850922. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:30,149][00194] Avg episode reward: [(0, '21.766')] [2024-09-01 15:53:30,761][03034] Updated weights for policy 0, policy_version 830 (0.2096) [2024-09-01 15:53:35,136][00194] Fps is (10 sec: 1229.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3403776. Throughput: 0: 227.9. Samples: 852476. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:35,146][00194] Avg episode reward: [(0, '21.151')] [2024-09-01 15:53:40,137][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3407872. Throughput: 0: 236.3. Samples: 853494. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:40,140][00194] Avg episode reward: [(0, '21.429')] [2024-09-01 15:53:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3411968. Throughput: 0: 227.2. Samples: 854534. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:53:45,144][00194] Avg episode reward: [(0, '21.514')] [2024-09-01 15:53:50,136][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3416064. Throughput: 0: 226.6. Samples: 855828. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:50,139][00194] Avg episode reward: [(0, '21.650')] [2024-09-01 15:53:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3420160. Throughput: 0: 232.0. Samples: 856730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:53:55,144][00194] Avg episode reward: [(0, '21.326')] [2024-09-01 15:54:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3424256. Throughput: 0: 231.4. Samples: 858036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:00,144][00194] Avg episode reward: [(0, '21.083')] [2024-09-01 15:54:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3428352. Throughput: 0: 222.5. Samples: 859326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:05,138][00194] Avg episode reward: [(0, '21.089')] [2024-09-01 15:54:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3432448. Throughput: 0: 227.6. Samples: 860010. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:10,146][00194] Avg episode reward: [(0, '20.824')] [2024-09-01 15:54:14,856][03034] Updated weights for policy 0, policy_version 840 (0.1524) [2024-09-01 15:54:15,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3440640. Throughput: 0: 240.3. Samples: 861734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:15,139][00194] Avg episode reward: [(0, '21.211')] [2024-09-01 15:54:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3440640. Throughput: 0: 227.5. Samples: 862714. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:20,144][00194] Avg episode reward: [(0, '21.716')] [2024-09-01 15:54:20,352][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth... [2024-09-01 15:54:20,468][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000787_3223552.pth [2024-09-01 15:54:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3448832. Throughput: 0: 226.2. Samples: 863674. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:25,139][00194] Avg episode reward: [(0, '21.447')] [2024-09-01 15:54:30,137][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3452928. Throughput: 0: 229.6. Samples: 864866. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:30,140][00194] Avg episode reward: [(0, '21.911')] [2024-09-01 15:54:35,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3457024. Throughput: 0: 230.3. Samples: 866194. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:35,139][00194] Avg episode reward: [(0, '21.879')] [2024-09-01 15:54:40,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3461120. Throughput: 0: 224.9. Samples: 866850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:54:40,141][00194] Avg episode reward: [(0, '21.792')] [2024-09-01 15:54:45,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3465216. Throughput: 0: 226.0. Samples: 868208. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:45,139][00194] Avg episode reward: [(0, '21.688')] [2024-09-01 15:54:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3469312. Throughput: 0: 235.2. Samples: 869912. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:50,139][00194] Avg episode reward: [(0, '21.989')] [2024-09-01 15:54:55,138][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3473408. Throughput: 0: 225.6. Samples: 870164. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:54:55,140][00194] Avg episode reward: [(0, '21.854')] [2024-09-01 15:55:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3477504. Throughput: 0: 223.3. Samples: 871784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:00,139][00194] Avg episode reward: [(0, '22.389')] [2024-09-01 15:55:00,217][03034] Updated weights for policy 0, policy_version 850 (0.1723) [2024-09-01 15:55:02,590][03021] Signal inference workers to stop experience collection... (850 times) [2024-09-01 15:55:02,648][03034] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-09-01 15:55:03,981][03021] Signal inference workers to resume experience collection... (850 times) [2024-09-01 15:55:03,983][03034] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-09-01 15:55:05,136][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3485696. Throughput: 0: 228.7. Samples: 873006. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:05,141][00194] Avg episode reward: [(0, '21.823')] [2024-09-01 15:55:10,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3489792. Throughput: 0: 227.0. Samples: 873890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:10,140][00194] Avg episode reward: [(0, '21.858')] [2024-09-01 15:55:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3493888. Throughput: 0: 226.2. Samples: 875046. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:15,144][00194] Avg episode reward: [(0, '21.996')] [2024-09-01 15:55:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3497984. Throughput: 0: 231.9. Samples: 876628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:20,144][00194] Avg episode reward: [(0, '21.823')] [2024-09-01 15:55:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3502080. Throughput: 0: 232.5. Samples: 877312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:25,139][00194] Avg episode reward: [(0, '22.289')] [2024-09-01 15:55:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3506176. Throughput: 0: 226.8. Samples: 878412. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:30,144][00194] Avg episode reward: [(0, '22.016')] [2024-09-01 15:55:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3510272. Throughput: 0: 223.4. Samples: 879966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:35,139][00194] Avg episode reward: [(0, '22.156')] [2024-09-01 15:55:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3518464. Throughput: 0: 233.2. Samples: 880656. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:40,141][00194] Avg episode reward: [(0, '22.198')] [2024-09-01 15:55:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3518464. Throughput: 0: 230.8. Samples: 882170. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:55:45,139][00194] Avg episode reward: [(0, '21.941')] [2024-09-01 15:55:45,174][03034] Updated weights for policy 0, policy_version 860 (0.0529) [2024-09-01 15:55:50,136][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3522560. Throughput: 0: 227.2. Samples: 883228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:50,144][00194] Avg episode reward: [(0, '22.074')] [2024-09-01 15:55:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 3530752. Throughput: 0: 223.4. Samples: 883944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:55:55,138][00194] Avg episode reward: [(0, '22.073')] [2024-09-01 15:56:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3534848. Throughput: 0: 228.3. Samples: 885318. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:00,144][00194] Avg episode reward: [(0, '22.349')] [2024-09-01 15:56:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3538944. Throughput: 0: 222.0. Samples: 886618. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:05,143][00194] Avg episode reward: [(0, '22.816')] [2024-09-01 15:56:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3543040. Throughput: 0: 222.9. Samples: 887342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:10,143][00194] Avg episode reward: [(0, '23.400')] [2024-09-01 15:56:12,309][03021] Saving new best policy, reward=23.400! [2024-09-01 15:56:15,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3547136. Throughput: 0: 232.2. Samples: 888860. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:15,146][00194] Avg episode reward: [(0, '23.841')] [2024-09-01 15:56:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3551232. Throughput: 0: 231.4. Samples: 890380. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:20,141][00194] Avg episode reward: [(0, '23.955')] [2024-09-01 15:56:21,064][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000868_3555328.pth... [2024-09-01 15:56:21,245][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000814_3334144.pth [2024-09-01 15:56:21,269][03021] Saving new best policy, reward=23.841! [2024-09-01 15:56:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3555328. Throughput: 0: 223.7. Samples: 890724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:25,140][00194] Avg episode reward: [(0, '24.086')] [2024-09-01 15:56:26,449][03021] Saving new best policy, reward=23.955! [2024-09-01 15:56:26,578][03021] Saving new best policy, reward=24.086! [2024-09-01 15:56:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3559424. Throughput: 0: 223.8. Samples: 892242. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:30,138][00194] Avg episode reward: [(0, '23.894')] [2024-09-01 15:56:30,884][03034] Updated weights for policy 0, policy_version 870 (0.1527) [2024-09-01 15:56:35,143][00194] Fps is (10 sec: 1227.9, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 3567616. Throughput: 0: 228.1. Samples: 893494. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:35,151][00194] Avg episode reward: [(0, '24.334')] [2024-09-01 15:56:39,670][03021] Saving new best policy, reward=24.334! [2024-09-01 15:56:40,141][00194] Fps is (10 sec: 1228.2, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3571712. Throughput: 0: 227.5. Samples: 894184. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:40,144][00194] Avg episode reward: [(0, '24.897')] [2024-09-01 15:56:44,827][03021] Saving new best policy, reward=24.897! [2024-09-01 15:56:45,136][00194] Fps is (10 sec: 819.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3575808. Throughput: 0: 224.9. Samples: 895438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:45,138][00194] Avg episode reward: [(0, '24.088')] [2024-09-01 15:56:50,136][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3579904. Throughput: 0: 229.5. Samples: 896944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:50,139][00194] Avg episode reward: [(0, '23.959')] [2024-09-01 15:56:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3584000. Throughput: 0: 228.2. Samples: 897610. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:56:55,146][00194] Avg episode reward: [(0, '24.016')] [2024-09-01 15:57:00,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3588096. Throughput: 0: 218.5. Samples: 898694. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:00,150][00194] Avg episode reward: [(0, '23.223')] [2024-09-01 15:57:05,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3592192. Throughput: 0: 219.6. Samples: 900264. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:05,146][00194] Avg episode reward: [(0, '23.286')] [2024-09-01 15:57:10,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3596288. Throughput: 0: 222.5. Samples: 900738. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:57:10,148][00194] Avg episode reward: [(0, '23.471')] [2024-09-01 15:57:15,145][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3600384. Throughput: 0: 227.9. Samples: 902498. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:57:15,148][00194] Avg episode reward: [(0, '23.546')] [2024-09-01 15:57:16,193][03034] Updated weights for policy 0, policy_version 880 (0.1056) [2024-09-01 15:57:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3604480. Throughput: 0: 225.2. Samples: 903626. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:57:20,138][00194] Avg episode reward: [(0, '23.941')] [2024-09-01 15:57:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3612672. Throughput: 0: 228.4. Samples: 904460. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:25,146][00194] Avg episode reward: [(0, '23.880')] [2024-09-01 15:57:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3616768. Throughput: 0: 232.1. Samples: 905882. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:30,138][00194] Avg episode reward: [(0, '23.270')] [2024-09-01 15:57:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 916.4). Total num frames: 3620864. Throughput: 0: 222.6. Samples: 906962. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:35,138][00194] Avg episode reward: [(0, '23.261')] [2024-09-01 15:57:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3624960. Throughput: 0: 224.9. Samples: 907730. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:57:40,141][00194] Avg episode reward: [(0, '23.318')] [2024-09-01 15:57:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3629056. Throughput: 0: 235.6. Samples: 909294. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:57:45,145][00194] Avg episode reward: [(0, '22.652')] [2024-09-01 15:57:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3633152. Throughput: 0: 235.8. Samples: 910876. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:57:50,144][00194] Avg episode reward: [(0, '22.464')] [2024-09-01 15:57:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3637248. Throughput: 0: 227.6. Samples: 910982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:57:55,141][00194] Avg episode reward: [(0, '21.841')] [2024-09-01 15:58:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3641344. Throughput: 0: 226.2. Samples: 912676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:58:00,149][00194] Avg episode reward: [(0, '21.566')] [2024-09-01 15:58:00,665][03034] Updated weights for policy 0, policy_version 890 (0.1654) [2024-09-01 15:58:05,138][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3649536. Throughput: 0: 229.5. Samples: 913952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:58:05,141][00194] Avg episode reward: [(0, '21.470')] [2024-09-01 15:58:10,138][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3653632. Throughput: 0: 226.7. Samples: 914660. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:58:10,140][00194] Avg episode reward: [(0, '20.831')] [2024-09-01 15:58:15,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3657728. Throughput: 0: 225.2. Samples: 916018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:58:15,138][00194] Avg episode reward: [(0, '20.667')] [2024-09-01 15:58:18,419][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth... [2024-09-01 15:58:18,539][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000841_3444736.pth [2024-09-01 15:58:20,136][00194] Fps is (10 sec: 819.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3661824. Throughput: 0: 235.7. Samples: 917570. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:20,138][00194] Avg episode reward: [(0, '21.120')] [2024-09-01 15:58:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3665920. Throughput: 0: 233.9. Samples: 918254. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:25,139][00194] Avg episode reward: [(0, '20.924')] [2024-09-01 15:58:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3670016. Throughput: 0: 222.7. Samples: 919316. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:30,142][00194] Avg episode reward: [(0, '21.131')] [2024-09-01 15:58:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3674112. Throughput: 0: 223.5. Samples: 920932. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:35,139][00194] Avg episode reward: [(0, '20.997')] [2024-09-01 15:58:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3678208. Throughput: 0: 236.2. Samples: 921612. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:40,139][00194] Avg episode reward: [(0, '21.241')] [2024-09-01 15:58:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3682304. Throughput: 0: 231.7. Samples: 923102. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:58:45,138][00194] Avg episode reward: [(0, '21.549')] [2024-09-01 15:58:45,651][03034] Updated weights for policy 0, policy_version 900 (0.0575) [2024-09-01 15:58:49,322][03021] Signal inference workers to stop experience collection... (900 times) [2024-09-01 15:58:49,396][03034] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-09-01 15:58:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3686400. Throughput: 0: 226.9. Samples: 924160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 15:58:50,141][00194] Avg episode reward: [(0, '21.557')] [2024-09-01 15:58:50,492][03021] Signal inference workers to resume experience collection... (900 times) [2024-09-01 15:58:50,493][03034] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-09-01 15:58:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3694592. Throughput: 0: 230.4. Samples: 925028. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:58:55,145][00194] Avg episode reward: [(0, '22.134')] [2024-09-01 15:59:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3698688. Throughput: 0: 230.0. Samples: 926366. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:59:00,140][00194] Avg episode reward: [(0, '21.879')] [2024-09-01 15:59:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3702784. Throughput: 0: 218.3. Samples: 927392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:05,139][00194] Avg episode reward: [(0, '21.876')] [2024-09-01 15:59:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3706880. Throughput: 0: 222.0. Samples: 928244. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:10,144][00194] Avg episode reward: [(0, '21.649')] [2024-09-01 15:59:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3710976. Throughput: 0: 231.6. Samples: 929736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:15,148][00194] Avg episode reward: [(0, '21.283')] [2024-09-01 15:59:20,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3715072. Throughput: 0: 224.1. Samples: 931016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 15:59:20,150][00194] Avg episode reward: [(0, '21.560')] [2024-09-01 15:59:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3719168. Throughput: 0: 221.4. Samples: 931574. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:59:25,144][00194] Avg episode reward: [(0, '22.354')] [2024-09-01 15:59:30,136][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3723264. Throughput: 0: 223.9. Samples: 933176. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 15:59:30,139][00194] Avg episode reward: [(0, '22.562')] [2024-09-01 15:59:30,695][03034] Updated weights for policy 0, policy_version 910 (0.1564) [2024-09-01 15:59:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3731456. Throughput: 0: 230.4. Samples: 934526. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:35,139][00194] Avg episode reward: [(0, '22.871')] [2024-09-01 15:59:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3735552. Throughput: 0: 229.1. Samples: 935338. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:40,142][00194] Avg episode reward: [(0, '23.149')] [2024-09-01 15:59:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3739648. Throughput: 0: 225.9. Samples: 936532. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:45,146][00194] Avg episode reward: [(0, '23.944')] [2024-09-01 15:59:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3743744. Throughput: 0: 235.4. Samples: 937984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:50,139][00194] Avg episode reward: [(0, '24.146')] [2024-09-01 15:59:55,137][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3747840. Throughput: 0: 230.7. Samples: 938624. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 15:59:55,144][00194] Avg episode reward: [(0, '24.034')] [2024-09-01 16:00:00,138][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3751936. Throughput: 0: 223.5. Samples: 939794. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:00,140][00194] Avg episode reward: [(0, '24.205')] [2024-09-01 16:00:05,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3756032. Throughput: 0: 228.6. Samples: 941300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:05,141][00194] Avg episode reward: [(0, '24.291')] [2024-09-01 16:00:10,136][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3760128. Throughput: 0: 226.6. Samples: 941772. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:10,147][00194] Avg episode reward: [(0, '24.125')] [2024-09-01 16:00:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3764224. Throughput: 0: 227.2. Samples: 943402. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:15,139][00194] Avg episode reward: [(0, '23.769')] [2024-09-01 16:00:16,114][03034] Updated weights for policy 0, policy_version 920 (0.1540) [2024-09-01 16:00:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3768320. Throughput: 0: 223.6. Samples: 944586. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:20,148][00194] Avg episode reward: [(0, '24.086')] [2024-09-01 16:00:21,046][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000921_3772416.pth... [2024-09-01 16:00:21,155][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000868_3555328.pth [2024-09-01 16:00:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3776512. Throughput: 0: 221.5. Samples: 945304. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:25,139][00194] Avg episode reward: [(0, '23.475')] [2024-09-01 16:00:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3780608. Throughput: 0: 225.3. Samples: 946672. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:00:30,141][00194] Avg episode reward: [(0, '24.147')] [2024-09-01 16:00:35,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 3784704. Throughput: 0: 218.6. Samples: 947822. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:00:35,144][00194] Avg episode reward: [(0, '24.289')] [2024-09-01 16:00:40,141][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3788800. Throughput: 0: 219.0. Samples: 948482. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:00:40,151][00194] Avg episode reward: [(0, '24.021')] [2024-09-01 16:00:45,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3792896. Throughput: 0: 230.3. Samples: 950156. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:45,150][00194] Avg episode reward: [(0, '23.460')] [2024-09-01 16:00:50,137][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3796992. Throughput: 0: 231.0. Samples: 951696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:50,144][00194] Avg episode reward: [(0, '23.930')] [2024-09-01 16:00:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3801088. Throughput: 0: 227.1. Samples: 951992. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:00:55,138][00194] Avg episode reward: [(0, '24.220')] [2024-09-01 16:01:00,136][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3805184. Throughput: 0: 226.5. Samples: 953594. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:00,148][00194] Avg episode reward: [(0, '23.937')] [2024-09-01 16:01:00,770][03034] Updated weights for policy 0, policy_version 930 (0.0685) [2024-09-01 16:01:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3813376. Throughput: 0: 230.3. Samples: 954950. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:05,143][00194] Avg episode reward: [(0, '24.109')] [2024-09-01 16:01:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3813376. Throughput: 0: 231.3. Samples: 955712. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:10,143][00194] Avg episode reward: [(0, '24.109')] [2024-09-01 16:01:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3821568. Throughput: 0: 227.4. Samples: 956906. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:15,138][00194] Avg episode reward: [(0, '23.607')] [2024-09-01 16:01:20,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3825664. Throughput: 0: 234.9. Samples: 958392. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:20,139][00194] Avg episode reward: [(0, '22.258')] [2024-09-01 16:01:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3829760. Throughput: 0: 235.4. Samples: 959072. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:25,140][00194] Avg episode reward: [(0, '22.667')] [2024-09-01 16:01:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3833856. Throughput: 0: 222.8. Samples: 960184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:30,145][00194] Avg episode reward: [(0, '23.086')] [2024-09-01 16:01:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3837952. Throughput: 0: 223.7. Samples: 961762. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:35,140][00194] Avg episode reward: [(0, '23.359')] [2024-09-01 16:01:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3842048. Throughput: 0: 233.4. Samples: 962496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:40,146][00194] Avg episode reward: [(0, '22.653')] [2024-09-01 16:01:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3846144. Throughput: 0: 230.8. Samples: 963980. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:45,139][00194] Avg episode reward: [(0, '22.871')] [2024-09-01 16:01:45,943][03034] Updated weights for policy 0, policy_version 940 (0.0555) [2024-09-01 16:01:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3850240. Throughput: 0: 227.9. Samples: 965206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:01:50,146][00194] Avg episode reward: [(0, '22.790')] [2024-09-01 16:01:55,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3858432. Throughput: 0: 228.5. Samples: 965996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:01:55,138][00194] Avg episode reward: [(0, '22.756')] [2024-09-01 16:02:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3862528. Throughput: 0: 231.2. Samples: 967312. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:00,140][00194] Avg episode reward: [(0, '22.462')] [2024-09-01 16:02:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3866624. Throughput: 0: 224.1. Samples: 968476. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:05,139][00194] Avg episode reward: [(0, '22.907')] [2024-09-01 16:02:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3870720. Throughput: 0: 224.6. Samples: 969178. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:10,145][00194] Avg episode reward: [(0, '22.353')] [2024-09-01 16:02:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3874816. Throughput: 0: 236.0. Samples: 970806. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:15,144][00194] Avg episode reward: [(0, '23.123')] [2024-09-01 16:02:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3878912. Throughput: 0: 235.2. Samples: 972346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:20,139][00194] Avg episode reward: [(0, '22.516')] [2024-09-01 16:02:21,425][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000948_3883008.pth... [2024-09-01 16:02:21,533][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth [2024-09-01 16:02:25,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3883008. Throughput: 0: 226.1. Samples: 972670. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:25,139][00194] Avg episode reward: [(0, '22.653')] [2024-09-01 16:02:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3887104. Throughput: 0: 229.3. Samples: 974300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:30,145][00194] Avg episode reward: [(0, '22.519')] [2024-09-01 16:02:30,804][03034] Updated weights for policy 0, policy_version 950 (0.1483) [2024-09-01 16:02:33,248][03021] Signal inference workers to stop experience collection... (950 times) [2024-09-01 16:02:33,319][03034] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-09-01 16:02:34,217][03021] Signal inference workers to resume experience collection... (950 times) [2024-09-01 16:02:34,219][03034] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-09-01 16:02:35,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3895296. Throughput: 0: 229.2. Samples: 975518. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:35,143][00194] Avg episode reward: [(0, '23.093')] [2024-09-01 16:02:40,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3899392. Throughput: 0: 226.3. Samples: 976180. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:40,141][00194] Avg episode reward: [(0, '23.167')] [2024-09-01 16:02:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3903488. Throughput: 0: 227.1. Samples: 977530. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:02:45,139][00194] Avg episode reward: [(0, '23.087')] [2024-09-01 16:02:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3907584. Throughput: 0: 229.9. Samples: 978822. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:50,139][00194] Avg episode reward: [(0, '23.175')] [2024-09-01 16:02:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3911680. Throughput: 0: 230.1. Samples: 979534. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:02:55,139][00194] Avg episode reward: [(0, '23.824')] [2024-09-01 16:03:00,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3915776. Throughput: 0: 219.8. Samples: 980698. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:00,139][00194] Avg episode reward: [(0, '24.015')] [2024-09-01 16:03:05,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3919872. Throughput: 0: 220.0. Samples: 982244. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:05,139][00194] Avg episode reward: [(0, '23.490')] [2024-09-01 16:03:10,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3923968. Throughput: 0: 224.1. Samples: 982756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:10,149][00194] Avg episode reward: [(0, '23.447')] [2024-09-01 16:03:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3928064. Throughput: 0: 225.7. Samples: 984458. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:15,140][00194] Avg episode reward: [(0, '23.740')] [2024-09-01 16:03:16,295][03034] Updated weights for policy 0, policy_version 960 (0.2640) [2024-09-01 16:03:20,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3932160. Throughput: 0: 222.2. Samples: 985518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:03:20,148][00194] Avg episode reward: [(0, '24.262')] [2024-09-01 16:03:25,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3940352. Throughput: 0: 230.0. Samples: 986528. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:25,147][00194] Avg episode reward: [(0, '24.329')] [2024-09-01 16:03:30,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3944448. Throughput: 0: 226.9. Samples: 987742. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:30,143][00194] Avg episode reward: [(0, '24.238')] [2024-09-01 16:03:35,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3948544. Throughput: 0: 225.1. Samples: 988952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:35,143][00194] Avg episode reward: [(0, '24.320')] [2024-09-01 16:03:40,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3952640. Throughput: 0: 226.5. Samples: 989726. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:40,138][00194] Avg episode reward: [(0, '24.216')] [2024-09-01 16:03:45,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 3956736. Throughput: 0: 235.0. Samples: 991274. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:45,150][00194] Avg episode reward: [(0, '25.185')] [2024-09-01 16:03:50,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3960832. Throughput: 0: 234.0. Samples: 992774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:03:50,140][00194] Avg episode reward: [(0, '25.249')] [2024-09-01 16:03:51,110][03021] Saving new best policy, reward=25.185! [2024-09-01 16:03:55,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3964928. Throughput: 0: 230.9. Samples: 993148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:03:55,142][00194] Avg episode reward: [(0, '25.319')] [2024-09-01 16:03:56,231][03021] Saving new best policy, reward=25.249! [2024-09-01 16:03:59,995][03021] Saving new best policy, reward=25.319! [2024-09-01 16:04:00,007][03034] Updated weights for policy 0, policy_version 970 (0.1020) [2024-09-01 16:04:00,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3973120. Throughput: 0: 229.5. Samples: 994784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:04:00,138][00194] Avg episode reward: [(0, '25.156')] [2024-09-01 16:04:05,136][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3977216. Throughput: 0: 230.3. Samples: 995880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:04:05,146][00194] Avg episode reward: [(0, '25.008')] [2024-09-01 16:04:10,139][00194] Fps is (10 sec: 818.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3981312. Throughput: 0: 224.2. Samples: 996620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:10,147][00194] Avg episode reward: [(0, '25.910')] [2024-09-01 16:04:14,339][03021] Saving new best policy, reward=25.910! [2024-09-01 16:04:15,136][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3985408. Throughput: 0: 224.9. Samples: 997864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:15,146][00194] Avg episode reward: [(0, '25.117')] [2024-09-01 16:04:18,206][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth... [2024-09-01 16:04:18,320][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000921_3772416.pth [2024-09-01 16:04:20,136][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 3989504. Throughput: 0: 235.2. Samples: 999536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:20,143][00194] Avg episode reward: [(0, '24.987')] [2024-09-01 16:04:25,139][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 916.4). Total num frames: 3993600. Throughput: 0: 233.1. Samples: 1000218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:25,149][00194] Avg episode reward: [(0, '25.077')] [2024-09-01 16:04:30,136][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 3997696. Throughput: 0: 224.0. Samples: 1001352. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:30,141][00194] Avg episode reward: [(0, '24.695')] [2024-09-01 16:04:35,136][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4001792. Throughput: 0: 226.2. Samples: 1002954. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:04:35,139][00194] Avg episode reward: [(0, '24.647')] [2024-09-01 16:04:36,205][03021] Stopping Batcher_0... [2024-09-01 16:04:36,207][03021] Loop batcher_evt_loop terminating... [2024-09-01 16:04:36,207][00194] Component Batcher_0 stopped! [2024-09-01 16:04:36,419][03034] Weights refcount: 2 0 [2024-09-01 16:04:36,423][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:04:36,429][03034] Stopping InferenceWorker_p0-w0... [2024-09-01 16:04:36,430][03034] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:04:36,780][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:04:36,788][03037] Stopping RolloutWorker_w2... [2024-09-01 16:04:36,814][03037] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:04:36,840][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:04:36,857][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:04:36,865][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:04:36,841][03036] Stopping RolloutWorker_w1... [2024-09-01 16:04:36,887][03036] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:04:36,890][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:04:36,911][03040] Stopping RolloutWorker_w5... [2024-09-01 16:04:36,911][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:04:36,926][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:04:36,871][03039] Stopping RolloutWorker_w4... [2024-09-01 16:04:36,888][03038] Stopping RolloutWorker_w3... [2024-09-01 16:04:36,946][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:04:36,898][03041] Stopping RolloutWorker_w6... [2024-09-01 16:04:36,954][03040] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:04:36,962][03038] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:04:36,952][03035] Stopping RolloutWorker_w0... [2024-09-01 16:04:36,954][03039] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:04:36,964][03041] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:04:36,945][03042] Stopping RolloutWorker_w7... [2024-09-01 16:04:36,992][03042] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:04:36,994][03035] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:04:41,411][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2024-09-01 16:04:41,524][03021] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000948_3883008.pth [2024-09-01 16:04:41,548][03021] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2024-09-01 16:04:41,741][03021] Stopping LearnerWorker_p0... [2024-09-01 16:04:41,741][03021] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:04:41,741][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:04:41,745][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:04:43,065][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:04:43,073][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:04:44,240][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:04:44,250][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:04:44,278][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:04:44,286][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:04:44,297][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:04:44,301][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:04:44,309][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:04:44,314][00194] Batcher 0 profile tree view: batching: 20.5903, releasing_batches: 0.2968 [2024-09-01 16:04:44,318][00194] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 44.9605 update_model: 132.0643 weight_update: 0.0560 one_step: 0.0290 handle_policy_step: 2882.0811 deserialize: 91.9792, stack: 14.2129, obs_to_device_normalize: 498.6304, forward: 2093.8209, send_messages: 67.3715 prepare_outputs: 34.0184 to_cpu: 3.4641 [2024-09-01 16:04:44,320][00194] Learner 0 profile tree view: misc: 0.0066, prepare_batch: 1289.7853 train: 3103.9562 epoch_init: 0.0091, minibatch_init: 0.0245, losses_postprocess: 0.1424, kl_divergence: 0.4661, after_optimizer: 2.6855 calculate_losses: 1524.2036 losses_init: 0.0044, forward_head: 1364.9844, bptt_initial: 4.4690, tail: 3.4347, advantages_returns: 0.2297, losses: 1.4560 bptt: 149.0615 bptt_forward_core: 148.1692 update: 1575.6538 clip: 3.8090 [2024-09-01 16:04:44,323][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6465, enqueue_policy_requests: 55.2539, env_step: 1623.3020, overhead: 40.2221, complete_rollouts: 17.8695 save_policy_outputs: 40.9623 split_output_tensors: 12.9830 [2024-09-01 16:04:44,325][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.7192, enqueue_policy_requests: 54.9433, env_step: 1625.3504, overhead: 40.9641, complete_rollouts: 15.5030 save_policy_outputs: 41.8531 split_output_tensors: 13.9784 [2024-09-01 16:04:44,327][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:04:44,329][00194] Runner profile tree view: main_loop: 4465.4549 [2024-09-01 16:04:44,331][00194] Collected {0: 4009984}, FPS: 898.0 [2024-09-01 16:05:41,893][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:05:41,897][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:05:41,898][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:05:41,901][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:05:41,903][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:05:41,905][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:05:41,907][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:05:41,908][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:05:41,911][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:05:41,912][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:05:41,914][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:05:41,915][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:05:41,917][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:05:41,920][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:05:41,922][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:05:41,924][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:05:41,926][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:05:41,927][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:05:41,928][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:05:41,930][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:05:41,931][00194] register_encoder_factory: [2024-09-01 16:05:41,965][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:05:41,975][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 16:05:41,976][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 16:05:41,980][00194] Weights and Biases integration disabled [2024-09-01 16:05:41,986][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 16:05:45,681][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 16:05:45,685][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 16:05:45,693][00194] Rollout worker 0 uses device cpu [2024-09-01 16:05:45,697][00194] Rollout worker 1 uses device cpu [2024-09-01 16:05:45,701][00194] Rollout worker 2 uses device cpu [2024-09-01 16:05:45,704][00194] Rollout worker 3 uses device cpu [2024-09-01 16:05:45,706][00194] Rollout worker 4 uses device cpu [2024-09-01 16:05:45,707][00194] Rollout worker 5 uses device cpu [2024-09-01 16:05:45,712][00194] Rollout worker 6 uses device cpu [2024-09-01 16:05:45,715][00194] Rollout worker 7 uses device cpu [2024-09-01 16:05:45,925][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 16:05:45,969][00194] Starting all processes... [2024-09-01 16:05:45,971][00194] Starting process learner_proc0 [2024-09-01 16:05:46,019][00194] Starting all processes... [2024-09-01 16:05:46,027][00194] Starting process inference_proc0-0 [2024-09-01 16:05:46,028][00194] Starting process rollout_proc0 [2024-09-01 16:05:46,028][00194] Starting process rollout_proc1 [2024-09-01 16:05:46,028][00194] Starting process rollout_proc2 [2024-09-01 16:05:46,029][00194] Starting process rollout_proc3 [2024-09-01 16:05:46,029][00194] Starting process rollout_proc4 [2024-09-01 16:05:46,029][00194] Starting process rollout_proc5 [2024-09-01 16:05:46,038][00194] Starting process rollout_proc7 [2024-09-01 16:05:46,038][00194] Starting process rollout_proc6 [2024-09-01 16:06:06,170][25505] Starting seed is not provided [2024-09-01 16:06:06,170][25505] Initializing actor-critic model on device cpu [2024-09-01 16:06:06,171][25505] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:06:06,174][25505] RunningMeanStd input shape: (1,) [2024-09-01 16:06:06,180][00194] Heartbeat connected on Batcher_0 [2024-09-01 16:06:06,359][25505] ConvEncoder: input_channels=3 [2024-09-01 16:06:06,444][25520] Worker 1 uses CPU cores [1] [2024-09-01 16:06:06,584][25523] Worker 4 uses CPU cores [0] [2024-09-01 16:06:06,611][25524] Worker 5 uses CPU cores [1] [2024-09-01 16:06:06,653][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 16:06:06,804][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 16:06:06,850][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 16:06:06,868][25522] Worker 3 uses CPU cores [1] [2024-09-01 16:06:06,906][25518] Worker 0 uses CPU cores [0] [2024-09-01 16:06:06,916][25525] Worker 6 uses CPU cores [0] [2024-09-01 16:06:06,920][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 16:06:06,977][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 16:06:06,987][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 16:06:06,993][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 16:06:07,002][25521] Worker 2 uses CPU cores [0] [2024-09-01 16:06:07,014][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 16:06:07,021][25526] Worker 7 uses CPU cores [1] [2024-09-01 16:06:07,032][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 16:06:07,100][25505] Conv encoder output size: 512 [2024-09-01 16:06:07,101][25505] Policy head output size: 512 [2024-09-01 16:06:07,129][25505] Created Actor Critic model with architecture: [2024-09-01 16:06:07,130][25505] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 16:06:07,851][25505] Using optimizer [2024-09-01 16:06:07,853][25505] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth... [2024-09-01 16:06:07,924][25505] Loading model from checkpoint [2024-09-01 16:06:07,984][25505] Loaded experiment state at self.train_step=979, self.env_steps=4009984 [2024-09-01 16:06:07,985][25505] Initialized policy 0 weights for model version 979 [2024-09-01 16:06:07,990][25519] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:06:07,994][25505] LearnerWorker_p0 finished initialization! [2024-09-01 16:06:07,993][25519] RunningMeanStd input shape: (1,) [2024-09-01 16:06:08,001][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 16:06:08,026][25519] ConvEncoder: input_channels=3 [2024-09-01 16:06:08,238][25519] Conv encoder output size: 512 [2024-09-01 16:06:08,238][25519] Policy head output size: 512 [2024-09-01 16:06:08,271][00194] Inference worker 0-0 is ready! [2024-09-01 16:06:08,275][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 16:06:08,469][25522] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,472][25520] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,474][25526] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,480][25521] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,486][25525] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,477][25524] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,490][25518] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:08,492][25523] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:06:10,166][25526] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,171][25520] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,175][25522] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,526][25521] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,549][25525] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,554][25518] Decorrelating experience for 0 frames... [2024-09-01 16:06:10,553][25523] Decorrelating experience for 0 frames... [2024-09-01 16:06:11,487][25521] Decorrelating experience for 32 frames... [2024-09-01 16:06:11,492][25525] Decorrelating experience for 32 frames... [2024-09-01 16:06:11,731][25526] Decorrelating experience for 32 frames... [2024-09-01 16:06:11,806][25524] Decorrelating experience for 0 frames... [2024-09-01 16:06:11,986][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4009984. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:12,433][25520] Decorrelating experience for 32 frames... [2024-09-01 16:06:12,518][25522] Decorrelating experience for 32 frames... [2024-09-01 16:06:12,869][25521] Decorrelating experience for 64 frames... [2024-09-01 16:06:13,105][25523] Decorrelating experience for 32 frames... [2024-09-01 16:06:13,232][25518] Decorrelating experience for 32 frames... [2024-09-01 16:06:13,808][25524] Decorrelating experience for 32 frames... [2024-09-01 16:06:14,045][25526] Decorrelating experience for 64 frames... [2024-09-01 16:06:14,888][25521] Decorrelating experience for 96 frames... [2024-09-01 16:06:14,957][25520] Decorrelating experience for 64 frames... [2024-09-01 16:06:15,259][25518] Decorrelating experience for 64 frames... [2024-09-01 16:06:16,246][25522] Decorrelating experience for 64 frames... [2024-09-01 16:06:16,688][25524] Decorrelating experience for 64 frames... [2024-09-01 16:06:16,988][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4009984. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:16,991][00194] Avg episode reward: [(0, '0.320')] [2024-09-01 16:06:17,033][25526] Decorrelating experience for 96 frames... [2024-09-01 16:06:17,742][25525] Decorrelating experience for 64 frames... [2024-09-01 16:06:19,759][25518] Decorrelating experience for 96 frames... [2024-09-01 16:06:20,199][25522] Decorrelating experience for 96 frames... [2024-09-01 16:06:20,358][25520] Decorrelating experience for 96 frames... [2024-09-01 16:06:20,698][25524] Decorrelating experience for 96 frames... [2024-09-01 16:06:21,988][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4009984. Throughput: 0: 66.6. Samples: 666. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:21,992][00194] Avg episode reward: [(0, '0.320')] [2024-09-01 16:06:22,087][25525] Decorrelating experience for 96 frames... [2024-09-01 16:06:22,583][25523] Decorrelating experience for 64 frames... [2024-09-01 16:06:23,851][25523] Decorrelating experience for 96 frames... [2024-09-01 16:06:26,012][25505] Signal inference workers to stop experience collection... [2024-09-01 16:06:26,059][25519] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 16:06:26,986][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4009984. Throughput: 0: 175.6. Samples: 2634. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:06:26,993][00194] Avg episode reward: [(0, '2.320')] [2024-09-01 16:06:27,908][25505] Signal inference workers to resume experience collection... [2024-09-01 16:06:27,910][25505] Stopping Batcher_0... [2024-09-01 16:06:27,913][25505] Loop batcher_evt_loop terminating... [2024-09-01 16:06:27,921][00194] Component Batcher_0 stopped! [2024-09-01 16:06:27,947][25519] Weights refcount: 2 0 [2024-09-01 16:06:27,950][25519] Stopping InferenceWorker_p0-w0... [2024-09-01 16:06:27,951][25519] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:06:27,950][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:06:28,400][25523] Stopping RolloutWorker_w4... [2024-09-01 16:06:28,400][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:06:28,402][25523] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:06:28,415][25521] Stopping RolloutWorker_w2... [2024-09-01 16:06:28,415][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:06:28,418][25521] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:06:28,431][25525] Stopping RolloutWorker_w6... [2024-09-01 16:06:28,431][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:06:28,439][25525] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:06:28,465][25520] Stopping RolloutWorker_w1... [2024-09-01 16:06:28,465][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:06:28,466][25520] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:06:28,493][25522] Stopping RolloutWorker_w3... [2024-09-01 16:06:28,493][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:06:28,493][25522] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:06:28,509][25526] Stopping RolloutWorker_w7... [2024-09-01 16:06:28,510][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:06:28,517][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:06:28,523][25524] Stopping RolloutWorker_w5... [2024-09-01 16:06:28,510][25526] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:06:28,524][25524] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:06:28,569][25518] Stopping RolloutWorker_w0... [2024-09-01 16:06:28,569][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:06:28,578][25518] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:06:33,646][25505] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-01 16:06:33,725][25505] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth [2024-09-01 16:06:33,737][25505] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-01 16:06:33,878][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:06:33,885][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:06:33,890][25505] Stopping LearnerWorker_p0... [2024-09-01 16:06:33,891][25505] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:06:34,540][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:06:34,545][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:06:34,550][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:06:34,556][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:06:34,560][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:06:34,566][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:06:34,570][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:06:34,574][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:06:34,580][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:06:34,583][00194] Batcher 0 profile tree view: batching: 0.0506, releasing_batches: 0.0020 [2024-09-01 16:06:34,586][00194] InferenceWorker_p0-w0 profile tree view: update_model: 0.0646 wait_policy: 0.0001 wait_policy_total: 9.7355 one_step: 0.0318 handle_policy_step: 7.4827 deserialize: 0.2000, stack: 0.0383, obs_to_device_normalize: 1.1280, forward: 5.5730, send_messages: 0.2052 prepare_outputs: 0.1651 to_cpu: 0.0130 [2024-09-01 16:06:34,590][00194] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 4.1518 train: 6.0339 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0007, after_optimizer: 0.0047 calculate_losses: 2.2055 losses_init: 0.0000, forward_head: 1.9855, bptt_initial: 0.0043, tail: 0.0103, advantages_returns: 0.0010, losses: 0.0028 bptt: 0.2010 bptt_forward_core: 0.1998 update: 3.8215 clip: 0.0086 [2024-09-01 16:06:34,592][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.1724, env_step: 2.7167, overhead: 0.0721, complete_rollouts: 0.0140 save_policy_outputs: 0.1292 split_output_tensors: 0.0193 [2024-09-01 16:06:34,595][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0038, enqueue_policy_requests: 0.6344, env_step: 4.8570, overhead: 0.1751, complete_rollouts: 0.0165 save_policy_outputs: 0.2269 split_output_tensors: 0.0810 [2024-09-01 16:06:34,599][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:06:34,603][00194] Runner profile tree view: main_loop: 48.6341 [2024-09-01 16:06:34,605][00194] Collected {0: 4018176}, FPS: 168.4 [2024-09-01 16:06:48,086][00194] Environment doom_basic already registered, overwriting... [2024-09-01 16:06:48,089][00194] Environment doom_two_colors_easy already registered, overwriting... [2024-09-01 16:06:48,092][00194] Environment doom_two_colors_hard already registered, overwriting... [2024-09-01 16:06:48,097][00194] Environment doom_dm already registered, overwriting... [2024-09-01 16:06:48,100][00194] Environment doom_dwango5 already registered, overwriting... [2024-09-01 16:06:48,101][00194] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-01 16:06:48,103][00194] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-01 16:06:48,104][00194] Environment doom_my_way_home already registered, overwriting... [2024-09-01 16:06:48,106][00194] Environment doom_deadly_corridor already registered, overwriting... [2024-09-01 16:06:48,107][00194] Environment doom_defend_the_center already registered, overwriting... [2024-09-01 16:06:48,109][00194] Environment doom_defend_the_line already registered, overwriting... [2024-09-01 16:06:48,110][00194] Environment doom_health_gathering already registered, overwriting... [2024-09-01 16:06:48,112][00194] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-01 16:06:48,113][00194] Environment doom_battle already registered, overwriting... [2024-09-01 16:06:48,115][00194] Environment doom_battle2 already registered, overwriting... [2024-09-01 16:06:48,116][00194] Environment doom_duel_bots already registered, overwriting... [2024-09-01 16:06:48,117][00194] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-01 16:06:48,119][00194] Environment doom_duel already registered, overwriting... [2024-09-01 16:06:48,121][00194] Environment doom_deathmatch_full already registered, overwriting... [2024-09-01 16:06:48,122][00194] Environment doom_benchmark already registered, overwriting... [2024-09-01 16:06:48,124][00194] register_encoder_factory: [2024-09-01 16:06:48,154][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:06:48,160][00194] Overriding arg 'train_for_env_steps' with value 6000000 passed from command line [2024-09-01 16:06:48,167][00194] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-01 16:06:48,171][00194] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-01 16:06:48,172][00194] Weights and Biases integration disabled [2024-09-01 16:06:48,177][00194] Environment var CUDA_VISIBLE_DEVICES is [2024-09-01 16:06:50,270][00194] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=cpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=6000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --device=cpu --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'device': 'cpu', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-01 16:06:50,273][00194] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-01 16:06:50,277][00194] Rollout worker 0 uses device cpu [2024-09-01 16:06:50,279][00194] Rollout worker 1 uses device cpu [2024-09-01 16:06:50,281][00194] Rollout worker 2 uses device cpu [2024-09-01 16:06:50,283][00194] Rollout worker 3 uses device cpu [2024-09-01 16:06:50,284][00194] Rollout worker 4 uses device cpu [2024-09-01 16:06:50,286][00194] Rollout worker 5 uses device cpu [2024-09-01 16:06:50,287][00194] Rollout worker 6 uses device cpu [2024-09-01 16:06:50,288][00194] Rollout worker 7 uses device cpu [2024-09-01 16:06:50,458][00194] InferenceWorker_p0-w0: min num requests: 2 [2024-09-01 16:06:50,500][00194] Starting all processes... [2024-09-01 16:06:50,502][00194] Starting process learner_proc0 [2024-09-01 16:06:50,557][00194] Starting all processes... [2024-09-01 16:06:50,565][00194] Starting process inference_proc0-0 [2024-09-01 16:06:50,566][00194] Starting process rollout_proc0 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc1 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc2 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc3 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc4 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc5 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc6 [2024-09-01 16:06:50,568][00194] Starting process rollout_proc7 [2024-09-01 16:07:05,585][26021] Worker 5 uses CPU cores [1] [2024-09-01 16:07:05,609][26019] Worker 3 uses CPU cores [1] [2024-09-01 16:07:05,647][26016] Worker 0 uses CPU cores [0] [2024-09-01 16:07:05,921][26018] Worker 1 uses CPU cores [1] [2024-09-01 16:07:05,924][26020] Worker 4 uses CPU cores [0] [2024-09-01 16:07:05,960][26002] Starting seed is not provided [2024-09-01 16:07:05,961][26002] Initializing actor-critic model on device cpu [2024-09-01 16:07:05,961][26002] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:07:05,963][26002] RunningMeanStd input shape: (1,) [2024-09-01 16:07:06,027][26022] Worker 6 uses CPU cores [0] [2024-09-01 16:07:06,034][26002] ConvEncoder: input_channels=3 [2024-09-01 16:07:06,109][26023] Worker 7 uses CPU cores [1] [2024-09-01 16:07:06,119][26017] Worker 2 uses CPU cores [0] [2024-09-01 16:07:06,249][26002] Conv encoder output size: 512 [2024-09-01 16:07:06,250][26002] Policy head output size: 512 [2024-09-01 16:07:06,267][26002] Created Actor Critic model with architecture: [2024-09-01 16:07:06,267][26002] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-01 16:07:06,769][26002] Using optimizer [2024-09-01 16:07:06,771][26002] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth... [2024-09-01 16:07:06,812][26002] Loading model from checkpoint [2024-09-01 16:07:06,841][26002] Loaded experiment state at self.train_step=981, self.env_steps=4018176 [2024-09-01 16:07:06,842][26002] Initialized policy 0 weights for model version 981 [2024-09-01 16:07:06,844][26002] LearnerWorker_p0 finished initialization! [2024-09-01 16:07:06,849][26015] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:07:06,850][26015] RunningMeanStd input shape: (1,) [2024-09-01 16:07:06,874][26015] ConvEncoder: input_channels=3 [2024-09-01 16:07:07,027][26015] Conv encoder output size: 512 [2024-09-01 16:07:07,028][26015] Policy head output size: 512 [2024-09-01 16:07:07,050][00194] Inference worker 0-0 is ready! [2024-09-01 16:07:07,052][00194] All inference workers are ready! Signal rollout workers to start! [2024-09-01 16:07:07,187][26023] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,190][26019] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,193][26021] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,203][26018] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,227][26022] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,224][26016] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,252][26017] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:07,258][26020] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:07:08,177][00194] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4018176. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:08,257][26022] Decorrelating experience for 0 frames... [2024-09-01 16:07:08,273][26020] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,124][26023] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,130][26021] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,129][26019] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,141][26018] Decorrelating experience for 0 frames... [2024-09-01 16:07:09,195][26022] Decorrelating experience for 32 frames... [2024-09-01 16:07:09,216][26020] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,447][00194] Heartbeat connected on Batcher_0 [2024-09-01 16:07:10,453][00194] Heartbeat connected on LearnerWorker_p0 [2024-09-01 16:07:10,493][00194] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-01 16:07:10,542][26023] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,545][26018] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,607][26017] Decorrelating experience for 0 frames... [2024-09-01 16:07:10,649][26016] Decorrelating experience for 0 frames... [2024-09-01 16:07:10,787][26019] Decorrelating experience for 32 frames... [2024-09-01 16:07:10,869][26020] Decorrelating experience for 64 frames... [2024-09-01 16:07:11,744][26021] Decorrelating experience for 32 frames... [2024-09-01 16:07:11,852][26018] Decorrelating experience for 64 frames... [2024-09-01 16:07:12,576][26016] Decorrelating experience for 32 frames... [2024-09-01 16:07:12,593][26017] Decorrelating experience for 32 frames... [2024-09-01 16:07:12,868][26022] Decorrelating experience for 64 frames... [2024-09-01 16:07:13,171][26020] Decorrelating experience for 96 frames... [2024-09-01 16:07:13,178][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4018176. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:13,611][26021] Decorrelating experience for 64 frames... [2024-09-01 16:07:13,699][00194] Heartbeat connected on RolloutWorker_w4 [2024-09-01 16:07:13,834][26018] Decorrelating experience for 96 frames... [2024-09-01 16:07:14,414][00194] Heartbeat connected on RolloutWorker_w1 [2024-09-01 16:07:15,125][26016] Decorrelating experience for 64 frames... [2024-09-01 16:07:15,224][26023] Decorrelating experience for 64 frames... [2024-09-01 16:07:17,374][26021] Decorrelating experience for 96 frames... [2024-09-01 16:07:17,847][00194] Heartbeat connected on RolloutWorker_w5 [2024-09-01 16:07:18,177][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4018176. Throughput: 0: 40.2. Samples: 402. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:18,184][00194] Avg episode reward: [(0, '3.420')] [2024-09-01 16:07:18,806][26022] Decorrelating experience for 96 frames... [2024-09-01 16:07:18,892][26017] Decorrelating experience for 64 frames... [2024-09-01 16:07:19,271][26016] Decorrelating experience for 96 frames... [2024-09-01 16:07:19,611][00194] Heartbeat connected on RolloutWorker_w6 [2024-09-01 16:07:20,302][00194] Heartbeat connected on RolloutWorker_w0 [2024-09-01 16:07:22,819][26019] Decorrelating experience for 64 frames... [2024-09-01 16:07:23,177][00194] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4018176. Throughput: 0: 108.4. Samples: 1626. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-01 16:07:23,179][00194] Avg episode reward: [(0, '4.904')] [2024-09-01 16:07:23,689][26017] Decorrelating experience for 96 frames... [2024-09-01 16:07:24,318][00194] Heartbeat connected on RolloutWorker_w2 [2024-09-01 16:07:24,681][26002] Signal inference workers to stop experience collection... [2024-09-01 16:07:24,723][26015] InferenceWorker_p0-w0: stopping experience collection [2024-09-01 16:07:25,227][26023] Decorrelating experience for 96 frames... [2024-09-01 16:07:25,403][00194] Heartbeat connected on RolloutWorker_w7 [2024-09-01 16:07:25,464][26019] Decorrelating experience for 96 frames... [2024-09-01 16:07:25,565][00194] Heartbeat connected on RolloutWorker_w3 [2024-09-01 16:07:25,848][26002] Signal inference workers to resume experience collection... [2024-09-01 16:07:25,849][26015] InferenceWorker_p0-w0: resuming experience collection [2024-09-01 16:07:28,177][00194] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4022272. Throughput: 0: 164.8. Samples: 3296. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 16:07:28,185][00194] Avg episode reward: [(0, '4.277')] [2024-09-01 16:07:33,179][00194] Fps is (10 sec: 819.0, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 4026368. Throughput: 0: 149.4. Samples: 3736. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-01 16:07:33,183][00194] Avg episode reward: [(0, '7.907')] [2024-09-01 16:07:38,179][00194] Fps is (10 sec: 819.1, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4030464. Throughput: 0: 148.7. Samples: 4460. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:07:38,188][00194] Avg episode reward: [(0, '8.065')] [2024-09-01 16:07:43,177][00194] Fps is (10 sec: 819.4, 60 sec: 468.1, 300 sec: 468.1). Total num frames: 4034560. Throughput: 0: 166.7. Samples: 5836. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:07:43,184][00194] Avg episode reward: [(0, '8.912')] [2024-09-01 16:07:48,177][00194] Fps is (10 sec: 819.3, 60 sec: 512.0, 300 sec: 512.0). Total num frames: 4038656. Throughput: 0: 163.5. Samples: 6540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:07:48,180][00194] Avg episode reward: [(0, '9.919')] [2024-09-01 16:07:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 4042752. Throughput: 0: 184.8. Samples: 8314. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:07:53,180][00194] Avg episode reward: [(0, '10.831')] [2024-09-01 16:07:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 573.4, 300 sec: 573.4). Total num frames: 4046848. Throughput: 0: 203.1. Samples: 9140. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:07:58,183][00194] Avg episode reward: [(0, '11.359')] [2024-09-01 16:08:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 595.8, 300 sec: 595.8). Total num frames: 4050944. Throughput: 0: 216.0. Samples: 10124. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:08:03,185][00194] Avg episode reward: [(0, '12.138')] [2024-09-01 16:08:07,361][26015] Updated weights for policy 0, policy_version 991 (0.1120) [2024-09-01 16:08:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 682.7, 300 sec: 682.7). Total num frames: 4059136. Throughput: 0: 223.2. Samples: 11668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:08,183][00194] Avg episode reward: [(0, '12.743')] [2024-09-01 16:08:13,182][00194] Fps is (10 sec: 1228.2, 60 sec: 750.9, 300 sec: 693.1). Total num frames: 4063232. Throughput: 0: 203.1. Samples: 12436. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:13,193][00194] Avg episode reward: [(0, '13.246')] [2024-09-01 16:08:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 702.2). Total num frames: 4067328. Throughput: 0: 216.7. Samples: 13488. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:18,185][00194] Avg episode reward: [(0, '13.947')] [2024-09-01 16:08:23,179][00194] Fps is (10 sec: 819.5, 60 sec: 887.4, 300 sec: 710.0). Total num frames: 4071424. Throughput: 0: 230.4. Samples: 14830. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:23,181][00194] Avg episode reward: [(0, '14.673')] [2024-09-01 16:08:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 716.8). Total num frames: 4075520. Throughput: 0: 240.9. Samples: 16678. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:28,180][00194] Avg episode reward: [(0, '14.769')] [2024-09-01 16:08:33,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 722.8). Total num frames: 4079616. Throughput: 0: 229.7. Samples: 16876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:33,191][00194] Avg episode reward: [(0, '14.831')] [2024-09-01 16:08:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 728.2). Total num frames: 4083712. Throughput: 0: 228.4. Samples: 18592. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:38,180][00194] Avg episode reward: [(0, '15.019')] [2024-09-01 16:08:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 776.1). Total num frames: 4091904. Throughput: 0: 224.6. Samples: 19246. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:08:43,179][00194] Avg episode reward: [(0, '15.753')] [2024-09-01 16:08:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 778.2). Total num frames: 4096000. Throughput: 0: 235.2. Samples: 20710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:08:48,181][00194] Avg episode reward: [(0, '15.722')] [2024-09-01 16:08:53,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 741.2). Total num frames: 4096000. Throughput: 0: 227.3. Samples: 21896. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:08:53,184][00194] Avg episode reward: [(0, '15.941')] [2024-09-01 16:08:53,307][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001001_4100096.pth... [2024-09-01 16:08:53,313][26015] Updated weights for policy 0, policy_version 1001 (0.2141) [2024-09-01 16:08:53,423][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth [2024-09-01 16:08:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 782.0). Total num frames: 4104192. Throughput: 0: 239.6. Samples: 23218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:08:58,179][00194] Avg episode reward: [(0, '16.570')] [2024-09-01 16:09:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 783.6). Total num frames: 4108288. Throughput: 0: 236.9. Samples: 24148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:03,180][00194] Avg episode reward: [(0, '17.257')] [2024-09-01 16:09:08,181][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 785.0). Total num frames: 4112384. Throughput: 0: 230.3. Samples: 25192. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:08,184][00194] Avg episode reward: [(0, '17.191')] [2024-09-01 16:09:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 786.4). Total num frames: 4116480. Throughput: 0: 222.3. Samples: 26682. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:13,179][00194] Avg episode reward: [(0, '18.269')] [2024-09-01 16:09:18,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 787.7). Total num frames: 4120576. Throughput: 0: 237.4. Samples: 27560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:18,183][00194] Avg episode reward: [(0, '19.897')] [2024-09-01 16:09:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 788.9). Total num frames: 4124672. Throughput: 0: 232.3. Samples: 29044. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:23,180][00194] Avg episode reward: [(0, '19.967')] [2024-09-01 16:09:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 789.9). Total num frames: 4128768. Throughput: 0: 244.5. Samples: 30250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:28,187][00194] Avg episode reward: [(0, '20.032')] [2024-09-01 16:09:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 4136960. Throughput: 0: 226.6. Samples: 30908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:33,180][00194] Avg episode reward: [(0, '21.086')] [2024-09-01 16:09:36,538][26015] Updated weights for policy 0, policy_version 1011 (0.2612) [2024-09-01 16:09:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 4141056. Throughput: 0: 235.8. Samples: 32508. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:38,182][00194] Avg episode reward: [(0, '21.683')] [2024-09-01 16:09:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4145152. Throughput: 0: 233.7. Samples: 33734. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:09:43,180][00194] Avg episode reward: [(0, '21.870')] [2024-09-01 16:09:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4149248. Throughput: 0: 226.2. Samples: 34326. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:09:48,180][00194] Avg episode reward: [(0, '21.780')] [2024-09-01 16:09:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 819.2). Total num frames: 4153344. Throughput: 0: 236.4. Samples: 35830. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:53,180][00194] Avg episode reward: [(0, '22.352')] [2024-09-01 16:09:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4157440. Throughput: 0: 238.3. Samples: 37406. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:09:58,180][00194] Avg episode reward: [(0, '22.221')] [2024-09-01 16:10:03,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 819.2). Total num frames: 4161536. Throughput: 0: 230.8. Samples: 37948. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:10:03,197][00194] Avg episode reward: [(0, '22.833')] [2024-09-01 16:10:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 819.2). Total num frames: 4165632. Throughput: 0: 233.2. Samples: 39540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:10:08,186][00194] Avg episode reward: [(0, '22.833')] [2024-09-01 16:10:13,177][00194] Fps is (10 sec: 1229.1, 60 sec: 955.7, 300 sec: 841.3). Total num frames: 4173824. Throughput: 0: 233.6. Samples: 40762. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:10:13,185][00194] Avg episode reward: [(0, '22.382')] [2024-09-01 16:10:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 840.8). Total num frames: 4177920. Throughput: 0: 238.2. Samples: 41626. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:18,183][00194] Avg episode reward: [(0, '22.234')] [2024-09-01 16:10:22,418][26015] Updated weights for policy 0, policy_version 1021 (0.1004) [2024-09-01 16:10:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 840.2). Total num frames: 4182016. Throughput: 0: 225.6. Samples: 42662. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:23,179][00194] Avg episode reward: [(0, '22.592')] [2024-09-01 16:10:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 839.7). Total num frames: 4186112. Throughput: 0: 235.5. Samples: 44332. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:28,180][00194] Avg episode reward: [(0, '22.806')] [2024-09-01 16:10:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 839.2). Total num frames: 4190208. Throughput: 0: 241.3. Samples: 45184. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:33,180][00194] Avg episode reward: [(0, '23.284')] [2024-09-01 16:10:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.7). Total num frames: 4194304. Throughput: 0: 233.5. Samples: 46338. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:38,182][00194] Avg episode reward: [(0, '23.781')] [2024-09-01 16:10:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 838.3). Total num frames: 4198400. Throughput: 0: 228.7. Samples: 47696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:10:43,182][00194] Avg episode reward: [(0, '24.124')] [2024-09-01 16:10:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 856.4). Total num frames: 4206592. Throughput: 0: 237.8. Samples: 48648. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:48,180][00194] Avg episode reward: [(0, '24.933')] [2024-09-01 16:10:52,079][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001028_4210688.pth... [2024-09-01 16:10:52,205][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000981_4018176.pth [2024-09-01 16:10:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 855.6). Total num frames: 4210688. Throughput: 0: 226.8. Samples: 49744. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:53,184][00194] Avg episode reward: [(0, '24.570')] [2024-09-01 16:10:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 854.8). Total num frames: 4214784. Throughput: 0: 224.5. Samples: 50866. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:10:58,180][00194] Avg episode reward: [(0, '23.878')] [2024-09-01 16:11:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.8, 300 sec: 854.1). Total num frames: 4218880. Throughput: 0: 226.7. Samples: 51828. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:03,180][00194] Avg episode reward: [(0, '24.630')] [2024-09-01 16:11:05,855][26015] Updated weights for policy 0, policy_version 1031 (0.1467) [2024-09-01 16:11:08,178][00194] Fps is (10 sec: 819.1, 60 sec: 955.7, 300 sec: 853.3). Total num frames: 4222976. Throughput: 0: 238.5. Samples: 53394. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:08,181][00194] Avg episode reward: [(0, '24.416')] [2024-09-01 16:11:08,999][26002] Signal inference workers to stop experience collection... (50 times) [2024-09-01 16:11:09,060][26015] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-09-01 16:11:10,186][26002] Signal inference workers to resume experience collection... (50 times) [2024-09-01 16:11:10,187][26015] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-09-01 16:11:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 852.6). Total num frames: 4227072. Throughput: 0: 223.8. Samples: 54402. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:13,184][00194] Avg episode reward: [(0, '23.874')] [2024-09-01 16:11:18,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 852.0). Total num frames: 4231168. Throughput: 0: 217.6. Samples: 54976. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:18,179][00194] Avg episode reward: [(0, '23.679')] [2024-09-01 16:11:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 851.3). Total num frames: 4235264. Throughput: 0: 236.1. Samples: 56962. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:11:23,185][00194] Avg episode reward: [(0, '24.322')] [2024-09-01 16:11:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.7). Total num frames: 4239360. Throughput: 0: 229.0. Samples: 58000. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:11:28,184][00194] Avg episode reward: [(0, '24.549')] [2024-09-01 16:11:33,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 850.1). Total num frames: 4243456. Throughput: 0: 220.7. Samples: 58580. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:11:33,189][00194] Avg episode reward: [(0, '24.854')] [2024-09-01 16:11:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 849.5). Total num frames: 4247552. Throughput: 0: 220.2. Samples: 59654. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:11:38,184][00194] Avg episode reward: [(0, '24.462')] [2024-09-01 16:11:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 849.0). Total num frames: 4251648. Throughput: 0: 222.4. Samples: 60874. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:43,184][00194] Avg episode reward: [(0, '24.454')] [2024-09-01 16:11:48,179][00194] Fps is (10 sec: 819.0, 60 sec: 819.2, 300 sec: 848.4). Total num frames: 4255744. Throughput: 0: 214.0. Samples: 61460. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:48,183][00194] Avg episode reward: [(0, '24.499')] [2024-09-01 16:11:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 847.9). Total num frames: 4259840. Throughput: 0: 204.0. Samples: 62572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:53,180][00194] Avg episode reward: [(0, '24.757')] [2024-09-01 16:11:54,518][26015] Updated weights for policy 0, policy_version 1041 (0.1754) [2024-09-01 16:11:58,177][00194] Fps is (10 sec: 819.4, 60 sec: 819.2, 300 sec: 847.4). Total num frames: 4263936. Throughput: 0: 215.5. Samples: 64100. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:11:58,182][00194] Avg episode reward: [(0, '24.680')] [2024-09-01 16:12:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 860.9). Total num frames: 4272128. Throughput: 0: 225.3. Samples: 65116. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:03,180][00194] Avg episode reward: [(0, '25.100')] [2024-09-01 16:12:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4276224. Throughput: 0: 204.6. Samples: 66168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:08,182][00194] Avg episode reward: [(0, '25.220')] [2024-09-01 16:12:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4280320. Throughput: 0: 210.9. Samples: 67492. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:13,184][00194] Avg episode reward: [(0, '25.315')] [2024-09-01 16:12:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4284416. Throughput: 0: 213.9. Samples: 68206. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:18,182][00194] Avg episode reward: [(0, '25.625')] [2024-09-01 16:12:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4288512. Throughput: 0: 225.8. Samples: 69814. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:23,180][00194] Avg episode reward: [(0, '25.851')] [2024-09-01 16:12:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4292608. Throughput: 0: 222.7. Samples: 70894. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:28,185][00194] Avg episode reward: [(0, '25.901')] [2024-09-01 16:12:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4296704. Throughput: 0: 222.2. Samples: 71458. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:12:33,180][00194] Avg episode reward: [(0, '26.142')] [2024-09-01 16:12:37,639][26002] Saving new best policy, reward=26.142! [2024-09-01 16:12:37,655][26015] Updated weights for policy 0, policy_version 1051 (0.1676) [2024-09-01 16:12:38,179][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4304896. Throughput: 0: 238.1. Samples: 73286. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:38,189][00194] Avg episode reward: [(0, '26.086')] [2024-09-01 16:12:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4304896. Throughput: 0: 226.9. Samples: 74310. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:43,184][00194] Avg episode reward: [(0, '25.556')] [2024-09-01 16:12:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 4313088. Throughput: 0: 218.8. Samples: 74962. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:48,189][00194] Avg episode reward: [(0, '25.942')] [2024-09-01 16:12:51,983][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001054_4317184.pth... [2024-09-01 16:12:52,091][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001001_4100096.pth [2024-09-01 16:12:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4317184. Throughput: 0: 228.3. Samples: 76440. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:53,180][00194] Avg episode reward: [(0, '25.123')] [2024-09-01 16:12:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4321280. Throughput: 0: 232.6. Samples: 77960. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:12:58,183][00194] Avg episode reward: [(0, '25.533')] [2024-09-01 16:13:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4325376. Throughput: 0: 226.7. Samples: 78406. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:03,179][00194] Avg episode reward: [(0, '25.730')] [2024-09-01 16:13:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4329472. Throughput: 0: 222.8. Samples: 79842. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:08,185][00194] Avg episode reward: [(0, '26.175')] [2024-09-01 16:13:09,985][26002] Saving new best policy, reward=26.175! [2024-09-01 16:13:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4333568. Throughput: 0: 235.3. Samples: 81482. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:13,182][00194] Avg episode reward: [(0, '25.694')] [2024-09-01 16:13:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4337664. Throughput: 0: 235.0. Samples: 82034. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:13:18,180][00194] Avg episode reward: [(0, '26.236')] [2024-09-01 16:13:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4341760. Throughput: 0: 218.6. Samples: 83124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:13:23,180][00194] Avg episode reward: [(0, '26.997')] [2024-09-01 16:13:24,045][26002] Saving new best policy, reward=26.236! [2024-09-01 16:13:24,052][26015] Updated weights for policy 0, policy_version 1061 (0.0564) [2024-09-01 16:13:27,854][26002] Saving new best policy, reward=26.997! [2024-09-01 16:13:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4349952. Throughput: 0: 228.0. Samples: 84570. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:28,180][00194] Avg episode reward: [(0, '26.932')] [2024-09-01 16:13:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4354048. Throughput: 0: 234.2. Samples: 85500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:33,179][00194] Avg episode reward: [(0, '27.021')] [2024-09-01 16:13:37,194][26002] Saving new best policy, reward=27.021! [2024-09-01 16:13:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4358144. Throughput: 0: 222.4. Samples: 86446. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:38,183][00194] Avg episode reward: [(0, '27.035')] [2024-09-01 16:13:41,904][26002] Saving new best policy, reward=27.035! [2024-09-01 16:13:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4362240. Throughput: 0: 221.6. Samples: 87930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:13:43,184][00194] Avg episode reward: [(0, '26.874')] [2024-09-01 16:13:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 4366336. Throughput: 0: 225.2. Samples: 88540. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:13:48,179][00194] Avg episode reward: [(0, '26.624')] [2024-09-01 16:13:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4370432. Throughput: 0: 229.1. Samples: 90150. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:13:53,180][00194] Avg episode reward: [(0, '26.624')] [2024-09-01 16:13:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4374528. Throughput: 0: 217.7. Samples: 91278. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:13:58,184][00194] Avg episode reward: [(0, '26.628')] [2024-09-01 16:14:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4378624. Throughput: 0: 223.2. Samples: 92076. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:14:03,186][00194] Avg episode reward: [(0, '25.625')] [2024-09-01 16:14:07,921][26015] Updated weights for policy 0, policy_version 1071 (0.1072) [2024-09-01 16:14:08,178][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4386816. Throughput: 0: 233.1. Samples: 93612. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:14:08,185][00194] Avg episode reward: [(0, '25.526')] [2024-09-01 16:14:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4390912. Throughput: 0: 224.5. Samples: 94674. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:13,181][00194] Avg episode reward: [(0, '25.796')] [2024-09-01 16:14:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4395008. Throughput: 0: 219.4. Samples: 95372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:18,180][00194] Avg episode reward: [(0, '25.692')] [2024-09-01 16:14:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4399104. Throughput: 0: 228.6. Samples: 96732. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:23,182][00194] Avg episode reward: [(0, '25.448')] [2024-09-01 16:14:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4403200. Throughput: 0: 236.1. Samples: 98556. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:28,181][00194] Avg episode reward: [(0, '24.703')] [2024-09-01 16:14:33,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4407296. Throughput: 0: 229.3. Samples: 98858. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:33,184][00194] Avg episode reward: [(0, '24.815')] [2024-09-01 16:14:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4411392. Throughput: 0: 223.9. Samples: 100224. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:14:38,179][00194] Avg episode reward: [(0, '24.032')] [2024-09-01 16:14:43,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4415488. Throughput: 0: 235.2. Samples: 101864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:14:43,185][00194] Avg episode reward: [(0, '23.589')] [2024-09-01 16:14:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4419584. Throughput: 0: 230.4. Samples: 102444. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:14:48,180][00194] Avg episode reward: [(0, '23.996')] [2024-09-01 16:14:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4423680. Throughput: 0: 222.4. Samples: 103622. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:14:53,185][00194] Avg episode reward: [(0, '23.741')] [2024-09-01 16:14:53,813][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001081_4427776.pth... [2024-09-01 16:14:53,817][26015] Updated weights for policy 0, policy_version 1081 (0.2107) [2024-09-01 16:14:53,927][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001028_4210688.pth [2024-09-01 16:14:56,082][26002] Signal inference workers to stop experience collection... (100 times) [2024-09-01 16:14:56,152][26015] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-09-01 16:14:57,565][26002] Signal inference workers to resume experience collection... (100 times) [2024-09-01 16:14:57,566][26015] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-09-01 16:14:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4431872. Throughput: 0: 231.9. Samples: 105108. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:14:58,182][00194] Avg episode reward: [(0, '24.017')] [2024-09-01 16:15:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4435968. Throughput: 0: 233.4. Samples: 105874. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:15:03,180][00194] Avg episode reward: [(0, '23.470')] [2024-09-01 16:15:08,185][00194] Fps is (10 sec: 818.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4440064. Throughput: 0: 229.3. Samples: 107050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:08,198][00194] Avg episode reward: [(0, '22.411')] [2024-09-01 16:15:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4444160. Throughput: 0: 223.7. Samples: 108622. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:13,179][00194] Avg episode reward: [(0, '22.251')] [2024-09-01 16:15:18,177][00194] Fps is (10 sec: 819.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4448256. Throughput: 0: 232.1. Samples: 109304. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:18,180][00194] Avg episode reward: [(0, '22.522')] [2024-09-01 16:15:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4452352. Throughput: 0: 233.8. Samples: 110744. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:23,180][00194] Avg episode reward: [(0, '22.716')] [2024-09-01 16:15:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4456448. Throughput: 0: 224.8. Samples: 111978. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:28,179][00194] Avg episode reward: [(0, '22.022')] [2024-09-01 16:15:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 4464640. Throughput: 0: 226.5. Samples: 112638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:33,183][00194] Avg episode reward: [(0, '22.062')] [2024-09-01 16:15:37,299][26015] Updated weights for policy 0, policy_version 1091 (0.1942) [2024-09-01 16:15:38,180][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4468736. Throughput: 0: 235.2. Samples: 114206. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:38,183][00194] Avg episode reward: [(0, '21.303')] [2024-09-01 16:15:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4472832. Throughput: 0: 225.9. Samples: 115272. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:43,180][00194] Avg episode reward: [(0, '21.581')] [2024-09-01 16:15:48,177][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4476928. Throughput: 0: 224.9. Samples: 115994. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:48,179][00194] Avg episode reward: [(0, '22.360')] [2024-09-01 16:15:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4481024. Throughput: 0: 237.6. Samples: 117742. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:53,181][00194] Avg episode reward: [(0, '21.657')] [2024-09-01 16:15:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4485120. Throughput: 0: 236.6. Samples: 119268. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:15:58,180][00194] Avg episode reward: [(0, '21.595')] [2024-09-01 16:16:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4489216. Throughput: 0: 229.3. Samples: 119624. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:16:03,183][00194] Avg episode reward: [(0, '22.423')] [2024-09-01 16:16:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 4493312. Throughput: 0: 234.3. Samples: 121288. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:16:08,189][00194] Avg episode reward: [(0, '22.373')] [2024-09-01 16:16:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4497408. Throughput: 0: 221.8. Samples: 121960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:16:13,181][00194] Avg episode reward: [(0, '22.370')] [2024-09-01 16:16:18,180][00194] Fps is (10 sec: 409.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4497408. Throughput: 0: 214.6. Samples: 122294. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:16:18,186][00194] Avg episode reward: [(0, '22.165')] [2024-09-01 16:16:23,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4501504. Throughput: 0: 196.3. Samples: 123040. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:16:23,183][00194] Avg episode reward: [(0, '22.188')] [2024-09-01 16:16:28,177][00194] Fps is (10 sec: 819.5, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4505600. Throughput: 0: 202.1. Samples: 124366. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:16:28,182][00194] Avg episode reward: [(0, '22.167')] [2024-09-01 16:16:28,861][26015] Updated weights for policy 0, policy_version 1101 (0.2124) [2024-09-01 16:16:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4513792. Throughput: 0: 207.0. Samples: 125310. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:33,180][00194] Avg episode reward: [(0, '22.660')] [2024-09-01 16:16:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4517888. Throughput: 0: 193.6. Samples: 126456. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:38,184][00194] Avg episode reward: [(0, '22.608')] [2024-09-01 16:16:43,177][00194] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 888.6). Total num frames: 4517888. Throughput: 0: 183.9. Samples: 127542. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:43,183][00194] Avg episode reward: [(0, '21.767')] [2024-09-01 16:16:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4526080. Throughput: 0: 198.0. Samples: 128536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:16:48,184][00194] Avg episode reward: [(0, '21.312')] [2024-09-01 16:16:51,127][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001106_4530176.pth... [2024-09-01 16:16:51,247][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001054_4317184.pth [2024-09-01 16:16:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 4530176. Throughput: 0: 193.5. Samples: 129996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:16:53,190][00194] Avg episode reward: [(0, '21.431')] [2024-09-01 16:16:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4534272. Throughput: 0: 199.9. Samples: 130954. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:16:58,182][00194] Avg episode reward: [(0, '21.104')] [2024-09-01 16:17:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4538368. Throughput: 0: 207.7. Samples: 131642. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:17:03,179][00194] Avg episode reward: [(0, '20.777')] [2024-09-01 16:17:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4542464. Throughput: 0: 229.0. Samples: 133346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:17:08,180][00194] Avg episode reward: [(0, '21.631')] [2024-09-01 16:17:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4546560. Throughput: 0: 224.0. Samples: 134448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:17:13,186][00194] Avg episode reward: [(0, '21.594')] [2024-09-01 16:17:14,987][26015] Updated weights for policy 0, policy_version 1111 (0.1004) [2024-09-01 16:17:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4550656. Throughput: 0: 213.6. Samples: 134920. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:17:18,185][00194] Avg episode reward: [(0, '21.538')] [2024-09-01 16:17:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4554752. Throughput: 0: 224.7. Samples: 136568. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:17:23,185][00194] Avg episode reward: [(0, '22.171')] [2024-09-01 16:17:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4562944. Throughput: 0: 231.0. Samples: 137936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:17:28,179][00194] Avg episode reward: [(0, '22.637')] [2024-09-01 16:17:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 4562944. Throughput: 0: 225.6. Samples: 138686. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:17:33,185][00194] Avg episode reward: [(0, '22.898')] [2024-09-01 16:17:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4571136. Throughput: 0: 216.4. Samples: 139736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:38,180][00194] Avg episode reward: [(0, '23.147')] [2024-09-01 16:17:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4575232. Throughput: 0: 225.0. Samples: 141078. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:43,180][00194] Avg episode reward: [(0, '23.221')] [2024-09-01 16:17:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4579328. Throughput: 0: 229.0. Samples: 141946. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:48,184][00194] Avg episode reward: [(0, '23.405')] [2024-09-01 16:17:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4583424. Throughput: 0: 215.0. Samples: 143022. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:17:53,182][00194] Avg episode reward: [(0, '23.405')] [2024-09-01 16:17:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4587520. Throughput: 0: 226.2. Samples: 144626. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:17:58,186][00194] Avg episode reward: [(0, '23.392')] [2024-09-01 16:18:00,196][26015] Updated weights for policy 0, policy_version 1121 (0.0537) [2024-09-01 16:18:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4591616. Throughput: 0: 231.2. Samples: 145324. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:03,186][00194] Avg episode reward: [(0, '23.385')] [2024-09-01 16:18:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4595712. Throughput: 0: 225.6. Samples: 146718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:18:08,180][00194] Avg episode reward: [(0, '23.552')] [2024-09-01 16:18:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4599808. Throughput: 0: 218.5. Samples: 147768. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:18:13,182][00194] Avg episode reward: [(0, '23.662')] [2024-09-01 16:18:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4608000. Throughput: 0: 222.4. Samples: 148692. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:18:18,189][00194] Avg episode reward: [(0, '23.160')] [2024-09-01 16:18:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4612096. Throughput: 0: 229.2. Samples: 150052. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:23,180][00194] Avg episode reward: [(0, '23.275')] [2024-09-01 16:18:28,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4616192. Throughput: 0: 223.0. Samples: 151112. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:28,184][00194] Avg episode reward: [(0, '23.210')] [2024-09-01 16:18:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4620288. Throughput: 0: 223.3. Samples: 151996. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:33,180][00194] Avg episode reward: [(0, '23.776')] [2024-09-01 16:18:38,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4624384. Throughput: 0: 226.7. Samples: 153224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:18:38,179][00194] Avg episode reward: [(0, '23.531')] [2024-09-01 16:18:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4628480. Throughput: 0: 226.6. Samples: 154824. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:43,185][00194] Avg episode reward: [(0, '24.070')] [2024-09-01 16:18:45,617][26015] Updated weights for policy 0, policy_version 1131 (0.1686) [2024-09-01 16:18:48,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4632576. Throughput: 0: 218.0. Samples: 155136. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:48,182][00194] Avg episode reward: [(0, '25.152')] [2024-09-01 16:18:49,159][26002] Signal inference workers to stop experience collection... (150 times) [2024-09-01 16:18:49,251][26015] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-09-01 16:18:50,348][26002] Signal inference workers to resume experience collection... (150 times) [2024-09-01 16:18:50,349][26015] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-09-01 16:18:50,362][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001132_4636672.pth... [2024-09-01 16:18:50,482][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001081_4427776.pth [2024-09-01 16:18:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4636672. Throughput: 0: 222.9. Samples: 156748. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:18:53,179][00194] Avg episode reward: [(0, '25.359')] [2024-09-01 16:18:58,177][00194] Fps is (10 sec: 1229.0, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4644864. Throughput: 0: 229.2. Samples: 158082. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:18:58,185][00194] Avg episode reward: [(0, '25.478')] [2024-09-01 16:19:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4644864. Throughput: 0: 225.7. Samples: 158848. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:19:03,185][00194] Avg episode reward: [(0, '25.145')] [2024-09-01 16:19:08,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4648960. Throughput: 0: 221.0. Samples: 159998. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:19:08,179][00194] Avg episode reward: [(0, '24.893')] [2024-09-01 16:19:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4657152. Throughput: 0: 229.8. Samples: 161454. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:13,180][00194] Avg episode reward: [(0, '24.973')] [2024-09-01 16:19:18,180][00194] Fps is (10 sec: 1228.5, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4661248. Throughput: 0: 227.1. Samples: 162216. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:18,190][00194] Avg episode reward: [(0, '24.829')] [2024-09-01 16:19:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4665344. Throughput: 0: 223.3. Samples: 163272. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:23,185][00194] Avg episode reward: [(0, '23.987')] [2024-09-01 16:19:28,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4669440. Throughput: 0: 223.1. Samples: 164862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:28,180][00194] Avg episode reward: [(0, '23.948')] [2024-09-01 16:19:30,568][26015] Updated weights for policy 0, policy_version 1141 (0.0525) [2024-09-01 16:19:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4673536. Throughput: 0: 230.9. Samples: 165524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:33,188][00194] Avg episode reward: [(0, '23.924')] [2024-09-01 16:19:38,180][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 4677632. Throughput: 0: 224.2. Samples: 166836. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:38,185][00194] Avg episode reward: [(0, '23.835')] [2024-09-01 16:19:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4681728. Throughput: 0: 222.2. Samples: 168080. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:43,180][00194] Avg episode reward: [(0, '23.820')] [2024-09-01 16:19:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4685824. Throughput: 0: 223.5. Samples: 168904. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:48,180][00194] Avg episode reward: [(0, '25.249')] [2024-09-01 16:19:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4694016. Throughput: 0: 230.3. Samples: 170362. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:53,183][00194] Avg episode reward: [(0, '25.912')] [2024-09-01 16:19:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 4694016. Throughput: 0: 220.9. Samples: 171394. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:19:58,182][00194] Avg episode reward: [(0, '25.991')] [2024-09-01 16:20:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4702208. Throughput: 0: 219.8. Samples: 172108. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:03,186][00194] Avg episode reward: [(0, '26.219')] [2024-09-01 16:20:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4706304. Throughput: 0: 226.5. Samples: 173464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:08,180][00194] Avg episode reward: [(0, '25.631')] [2024-09-01 16:20:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4710400. Throughput: 0: 227.4. Samples: 175094. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:13,180][00194] Avg episode reward: [(0, '26.296')] [2024-09-01 16:20:16,548][26015] Updated weights for policy 0, policy_version 1151 (0.1530) [2024-09-01 16:20:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4714496. Throughput: 0: 222.3. Samples: 175526. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:18,183][00194] Avg episode reward: [(0, '26.681')] [2024-09-01 16:20:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4718592. Throughput: 0: 224.0. Samples: 176914. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:23,181][00194] Avg episode reward: [(0, '26.178')] [2024-09-01 16:20:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4722688. Throughput: 0: 230.8. Samples: 178464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:28,180][00194] Avg episode reward: [(0, '26.088')] [2024-09-01 16:20:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4726784. Throughput: 0: 228.2. Samples: 179174. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:33,180][00194] Avg episode reward: [(0, '25.861')] [2024-09-01 16:20:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 4730880. Throughput: 0: 217.6. Samples: 180152. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:38,180][00194] Avg episode reward: [(0, '26.192')] [2024-09-01 16:20:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4739072. Throughput: 0: 231.6. Samples: 181818. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:20:43,179][00194] Avg episode reward: [(0, '27.073')] [2024-09-01 16:20:46,499][26002] Saving new best policy, reward=27.073! [2024-09-01 16:20:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4743168. Throughput: 0: 235.6. Samples: 182710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:48,184][00194] Avg episode reward: [(0, '26.905')] [2024-09-01 16:20:52,211][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001159_4747264.pth... [2024-09-01 16:20:52,288][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001106_4530176.pth [2024-09-01 16:20:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4747264. Throughput: 0: 226.1. Samples: 183638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:53,188][00194] Avg episode reward: [(0, '26.621')] [2024-09-01 16:20:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 4751360. Throughput: 0: 222.8. Samples: 185118. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:20:58,185][00194] Avg episode reward: [(0, '26.404')] [2024-09-01 16:21:00,777][26015] Updated weights for policy 0, policy_version 1161 (0.0669) [2024-09-01 16:21:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4755456. Throughput: 0: 228.1. Samples: 185790. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:03,198][00194] Avg episode reward: [(0, '26.060')] [2024-09-01 16:21:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4759552. Throughput: 0: 228.3. Samples: 187186. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:21:08,184][00194] Avg episode reward: [(0, '26.880')] [2024-09-01 16:21:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4763648. Throughput: 0: 220.7. Samples: 188396. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:21:13,181][00194] Avg episode reward: [(0, '27.494')] [2024-09-01 16:21:15,305][26002] Saving new best policy, reward=27.494! [2024-09-01 16:21:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4767744. Throughput: 0: 222.0. Samples: 189162. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:21:18,189][00194] Avg episode reward: [(0, '27.752')] [2024-09-01 16:21:23,044][26002] Saving new best policy, reward=27.752! [2024-09-01 16:21:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4775936. Throughput: 0: 238.4. Samples: 190880. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:23,180][00194] Avg episode reward: [(0, '28.043')] [2024-09-01 16:21:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4775936. Throughput: 0: 223.1. Samples: 191856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:28,180][00194] Avg episode reward: [(0, '28.605')] [2024-09-01 16:21:28,766][26002] Saving new best policy, reward=28.043! [2024-09-01 16:21:33,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4780032. Throughput: 0: 215.9. Samples: 192426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:33,185][00194] Avg episode reward: [(0, '28.438')] [2024-09-01 16:21:33,496][26002] Saving new best policy, reward=28.605! [2024-09-01 16:21:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4788224. Throughput: 0: 227.5. Samples: 193876. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:38,183][00194] Avg episode reward: [(0, '28.370')] [2024-09-01 16:21:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4792320. Throughput: 0: 218.1. Samples: 194932. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:43,184][00194] Avg episode reward: [(0, '28.777')] [2024-09-01 16:21:46,828][26002] Saving new best policy, reward=28.777! [2024-09-01 16:21:46,833][26015] Updated weights for policy 0, policy_version 1171 (0.0621) [2024-09-01 16:21:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4796416. Throughput: 0: 225.7. Samples: 195948. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:48,180][00194] Avg episode reward: [(0, '28.218')] [2024-09-01 16:21:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4800512. Throughput: 0: 221.9. Samples: 197170. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:53,184][00194] Avg episode reward: [(0, '27.670')] [2024-09-01 16:21:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4804608. Throughput: 0: 237.2. Samples: 199070. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:21:58,179][00194] Avg episode reward: [(0, '27.562')] [2024-09-01 16:22:03,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4808704. Throughput: 0: 225.7. Samples: 199320. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:03,182][00194] Avg episode reward: [(0, '27.447')] [2024-09-01 16:22:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4812800. Throughput: 0: 214.6. Samples: 200536. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:08,179][00194] Avg episode reward: [(0, '27.085')] [2024-09-01 16:22:13,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4816896. Throughput: 0: 229.4. Samples: 202180. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:13,186][00194] Avg episode reward: [(0, '27.707')] [2024-09-01 16:22:18,180][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4825088. Throughput: 0: 235.8. Samples: 203036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:18,183][00194] Avg episode reward: [(0, '27.036')] [2024-09-01 16:22:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 4825088. Throughput: 0: 232.8. Samples: 204354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:22:23,180][00194] Avg episode reward: [(0, '27.002')] [2024-09-01 16:22:28,177][00194] Fps is (10 sec: 819.5, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4833280. Throughput: 0: 233.3. Samples: 205430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:22:28,186][00194] Avg episode reward: [(0, '27.165')] [2024-09-01 16:22:31,927][26015] Updated weights for policy 0, policy_version 1181 (0.1670) [2024-09-01 16:22:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4837376. Throughput: 0: 233.2. Samples: 206442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:22:33,180][00194] Avg episode reward: [(0, '27.113')] [2024-09-01 16:22:34,243][26002] Signal inference workers to stop experience collection... (200 times) [2024-09-01 16:22:34,308][26015] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-09-01 16:22:35,711][26002] Signal inference workers to resume experience collection... (200 times) [2024-09-01 16:22:35,713][26015] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-09-01 16:22:38,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4841472. Throughput: 0: 232.7. Samples: 207642. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:38,190][00194] Avg episode reward: [(0, '26.311')] [2024-09-01 16:22:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4845568. Throughput: 0: 217.6. Samples: 208860. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:43,180][00194] Avg episode reward: [(0, '25.657')] [2024-09-01 16:22:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4849664. Throughput: 0: 225.9. Samples: 209486. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:48,181][00194] Avg episode reward: [(0, '25.622')] [2024-09-01 16:22:50,066][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001185_4853760.pth... [2024-09-01 16:22:50,182][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001132_4636672.pth [2024-09-01 16:22:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4853760. Throughput: 0: 240.2. Samples: 211344. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:22:53,185][00194] Avg episode reward: [(0, '25.926')] [2024-09-01 16:22:58,180][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4857856. Throughput: 0: 226.4. Samples: 212368. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:22:58,188][00194] Avg episode reward: [(0, '25.848')] [2024-09-01 16:23:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4861952. Throughput: 0: 219.8. Samples: 212928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:23:03,180][00194] Avg episode reward: [(0, '25.533')] [2024-09-01 16:23:08,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4866048. Throughput: 0: 228.4. Samples: 214634. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:23:08,184][00194] Avg episode reward: [(0, '24.453')] [2024-09-01 16:23:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4874240. Throughput: 0: 234.2. Samples: 215968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:23:13,181][00194] Avg episode reward: [(0, '23.632')] [2024-09-01 16:23:17,650][26015] Updated weights for policy 0, policy_version 1191 (0.1826) [2024-09-01 16:23:18,177][00194] Fps is (10 sec: 1228.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4878336. Throughput: 0: 226.5. Samples: 216634. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:23:18,184][00194] Avg episode reward: [(0, '23.473')] [2024-09-01 16:23:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4882432. Throughput: 0: 221.3. Samples: 217600. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:23:23,180][00194] Avg episode reward: [(0, '24.007')] [2024-09-01 16:23:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4886528. Throughput: 0: 235.7. Samples: 219466. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:23:28,180][00194] Avg episode reward: [(0, '24.305')] [2024-09-01 16:23:33,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4890624. Throughput: 0: 229.4. Samples: 219810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:33,182][00194] Avg episode reward: [(0, '24.018')] [2024-09-01 16:23:38,180][00194] Fps is (10 sec: 818.9, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4894720. Throughput: 0: 215.8. Samples: 221054. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:38,186][00194] Avg episode reward: [(0, '24.131')] [2024-09-01 16:23:43,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4898816. Throughput: 0: 229.1. Samples: 222678. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:43,179][00194] Avg episode reward: [(0, '23.711')] [2024-09-01 16:23:48,177][00194] Fps is (10 sec: 1229.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4907008. Throughput: 0: 232.4. Samples: 223386. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:48,180][00194] Avg episode reward: [(0, '24.096')] [2024-09-01 16:23:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 4907008. Throughput: 0: 224.2. Samples: 224722. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:53,180][00194] Avg episode reward: [(0, '24.100')] [2024-09-01 16:23:58,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4911104. Throughput: 0: 217.7. Samples: 225764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:23:58,185][00194] Avg episode reward: [(0, '24.015')] [2024-09-01 16:24:02,535][26015] Updated weights for policy 0, policy_version 1201 (0.2548) [2024-09-01 16:24:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4919296. Throughput: 0: 224.9. Samples: 226756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:03,180][00194] Avg episode reward: [(0, '24.153')] [2024-09-01 16:24:08,179][00194] Fps is (10 sec: 1228.5, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4923392. Throughput: 0: 235.8. Samples: 228210. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:08,186][00194] Avg episode reward: [(0, '24.533')] [2024-09-01 16:24:13,182][00194] Fps is (10 sec: 818.8, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 4927488. Throughput: 0: 216.9. Samples: 229228. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:13,190][00194] Avg episode reward: [(0, '24.135')] [2024-09-01 16:24:18,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4931584. Throughput: 0: 223.7. Samples: 229878. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:18,180][00194] Avg episode reward: [(0, '24.265')] [2024-09-01 16:24:23,177][00194] Fps is (10 sec: 819.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4935680. Throughput: 0: 237.3. Samples: 231730. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:23,180][00194] Avg episode reward: [(0, '25.074')] [2024-09-01 16:24:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4939776. Throughput: 0: 227.8. Samples: 232930. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:28,180][00194] Avg episode reward: [(0, '25.369')] [2024-09-01 16:24:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4943872. Throughput: 0: 220.6. Samples: 233312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:33,186][00194] Avg episode reward: [(0, '25.850')] [2024-09-01 16:24:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.8, 300 sec: 916.4). Total num frames: 4952064. Throughput: 0: 227.4. Samples: 234956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:38,182][00194] Avg episode reward: [(0, '25.851')] [2024-09-01 16:24:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4956160. Throughput: 0: 235.2. Samples: 236346. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:43,184][00194] Avg episode reward: [(0, '26.430')] [2024-09-01 16:24:47,505][26015] Updated weights for policy 0, policy_version 1211 (0.0542) [2024-09-01 16:24:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4960256. Throughput: 0: 228.0. Samples: 237016. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:24:48,180][00194] Avg episode reward: [(0, '26.794')] [2024-09-01 16:24:52,208][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001212_4964352.pth... [2024-09-01 16:24:52,323][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001159_4747264.pth [2024-09-01 16:24:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4964352. Throughput: 0: 217.7. Samples: 238006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:24:53,180][00194] Avg episode reward: [(0, '26.405')] [2024-09-01 16:24:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 4968448. Throughput: 0: 232.1. Samples: 239670. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:24:58,180][00194] Avg episode reward: [(0, '25.465')] [2024-09-01 16:25:03,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4972544. Throughput: 0: 232.1. Samples: 240322. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:03,184][00194] Avg episode reward: [(0, '25.199')] [2024-09-01 16:25:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4976640. Throughput: 0: 214.8. Samples: 241394. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:08,186][00194] Avg episode reward: [(0, '25.529')] [2024-09-01 16:25:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4980736. Throughput: 0: 226.1. Samples: 243106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:13,179][00194] Avg episode reward: [(0, '25.556')] [2024-09-01 16:25:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 4988928. Throughput: 0: 236.1. Samples: 243938. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:18,180][00194] Avg episode reward: [(0, '26.311')] [2024-09-01 16:25:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4988928. Throughput: 0: 224.8. Samples: 245072. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:23,180][00194] Avg episode reward: [(0, '25.548')] [2024-09-01 16:25:28,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 4993024. Throughput: 0: 219.0. Samples: 246200. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:28,179][00194] Avg episode reward: [(0, '25.154')] [2024-09-01 16:25:32,397][26015] Updated weights for policy 0, policy_version 1221 (0.1037) [2024-09-01 16:25:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5001216. Throughput: 0: 223.4. Samples: 247070. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:33,180][00194] Avg episode reward: [(0, '25.424')] [2024-09-01 16:25:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5005312. Throughput: 0: 228.9. Samples: 248306. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:38,181][00194] Avg episode reward: [(0, '24.996')] [2024-09-01 16:25:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5009408. Throughput: 0: 219.6. Samples: 249550. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:43,182][00194] Avg episode reward: [(0, '25.339')] [2024-09-01 16:25:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5013504. Throughput: 0: 220.8. Samples: 250260. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:48,185][00194] Avg episode reward: [(0, '25.125')] [2024-09-01 16:25:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5017600. Throughput: 0: 237.2. Samples: 252066. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:53,180][00194] Avg episode reward: [(0, '25.237')] [2024-09-01 16:25:58,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5021696. Throughput: 0: 227.4. Samples: 253340. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:25:58,180][00194] Avg episode reward: [(0, '25.443')] [2024-09-01 16:26:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5025792. Throughput: 0: 216.5. Samples: 253682. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:03,184][00194] Avg episode reward: [(0, '24.973')] [2024-09-01 16:26:08,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5029888. Throughput: 0: 229.0. Samples: 255378. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:08,185][00194] Avg episode reward: [(0, '25.326')] [2024-09-01 16:26:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5038080. Throughput: 0: 233.8. Samples: 256720. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:13,184][00194] Avg episode reward: [(0, '25.131')] [2024-09-01 16:26:17,755][26015] Updated weights for policy 0, policy_version 1231 (0.0622) [2024-09-01 16:26:18,178][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5042176. Throughput: 0: 230.3. Samples: 257432. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:18,182][00194] Avg episode reward: [(0, '24.910')] [2024-09-01 16:26:21,209][26002] Signal inference workers to stop experience collection... (250 times) [2024-09-01 16:26:21,250][26015] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-09-01 16:26:22,397][26002] Signal inference workers to resume experience collection... (250 times) [2024-09-01 16:26:22,398][26015] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-09-01 16:26:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5046272. Throughput: 0: 226.0. Samples: 258476. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:23,185][00194] Avg episode reward: [(0, '24.996')] [2024-09-01 16:26:28,177][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5050368. Throughput: 0: 238.2. Samples: 260268. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:28,179][00194] Avg episode reward: [(0, '25.330')] [2024-09-01 16:26:33,181][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5054464. Throughput: 0: 234.2. Samples: 260800. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:33,188][00194] Avg episode reward: [(0, '24.743')] [2024-09-01 16:26:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5058560. Throughput: 0: 217.7. Samples: 261862. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:38,180][00194] Avg episode reward: [(0, '24.999')] [2024-09-01 16:26:43,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5062656. Throughput: 0: 226.2. Samples: 263518. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:43,181][00194] Avg episode reward: [(0, '24.556')] [2024-09-01 16:26:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5070848. Throughput: 0: 233.8. Samples: 264204. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:48,179][00194] Avg episode reward: [(0, '24.914')] [2024-09-01 16:26:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5070848. Throughput: 0: 227.9. Samples: 265632. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:53,180][00194] Avg episode reward: [(0, '24.642')] [2024-09-01 16:26:53,769][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth... [2024-09-01 16:26:53,847][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001185_4853760.pth [2024-09-01 16:26:58,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5074944. Throughput: 0: 220.5. Samples: 266642. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:26:58,185][00194] Avg episode reward: [(0, '25.300')] [2024-09-01 16:27:02,466][26015] Updated weights for policy 0, policy_version 1241 (0.1041) [2024-09-01 16:27:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5083136. Throughput: 0: 228.3. Samples: 267704. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:03,179][00194] Avg episode reward: [(0, '25.853')] [2024-09-01 16:27:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5087232. Throughput: 0: 228.9. Samples: 268778. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:08,184][00194] Avg episode reward: [(0, '26.317')] [2024-09-01 16:27:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5091328. Throughput: 0: 215.9. Samples: 269982. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:13,180][00194] Avg episode reward: [(0, '26.383')] [2024-09-01 16:27:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5095424. Throughput: 0: 220.7. Samples: 270730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:18,179][00194] Avg episode reward: [(0, '26.575')] [2024-09-01 16:27:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5099520. Throughput: 0: 230.7. Samples: 272244. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:23,179][00194] Avg episode reward: [(0, '25.886')] [2024-09-01 16:27:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5103616. Throughput: 0: 211.2. Samples: 273020. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:28,181][00194] Avg episode reward: [(0, '26.109')] [2024-09-01 16:27:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5107712. Throughput: 0: 219.8. Samples: 274096. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:33,182][00194] Avg episode reward: [(0, '26.587')] [2024-09-01 16:27:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5111808. Throughput: 0: 225.7. Samples: 275788. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:27:38,179][00194] Avg episode reward: [(0, '25.520')] [2024-09-01 16:27:43,178][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5120000. Throughput: 0: 218.7. Samples: 276484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:27:43,182][00194] Avg episode reward: [(0, '26.484')] [2024-09-01 16:27:48,000][26015] Updated weights for policy 0, policy_version 1251 (0.1605) [2024-09-01 16:27:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5124096. Throughput: 0: 225.5. Samples: 277850. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:27:48,184][00194] Avg episode reward: [(0, '26.844')] [2024-09-01 16:27:53,177][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5128192. Throughput: 0: 226.1. Samples: 278952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:27:53,180][00194] Avg episode reward: [(0, '27.357')] [2024-09-01 16:27:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5132288. Throughput: 0: 235.6. Samples: 280584. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:27:58,180][00194] Avg episode reward: [(0, '27.600')] [2024-09-01 16:28:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5136384. Throughput: 0: 229.6. Samples: 281062. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:03,179][00194] Avg episode reward: [(0, '27.089')] [2024-09-01 16:28:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5140480. Throughput: 0: 223.0. Samples: 282278. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:08,185][00194] Avg episode reward: [(0, '26.846')] [2024-09-01 16:28:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5144576. Throughput: 0: 243.2. Samples: 283962. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:13,179][00194] Avg episode reward: [(0, '26.327')] [2024-09-01 16:28:18,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5148672. Throughput: 0: 235.3. Samples: 284686. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:18,187][00194] Avg episode reward: [(0, '26.687')] [2024-09-01 16:28:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5152768. Throughput: 0: 225.2. Samples: 285922. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:23,183][00194] Avg episode reward: [(0, '26.512')] [2024-09-01 16:28:28,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5156864. Throughput: 0: 237.1. Samples: 287152. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:28,179][00194] Avg episode reward: [(0, '26.852')] [2024-09-01 16:28:32,670][26015] Updated weights for policy 0, policy_version 1261 (0.1477) [2024-09-01 16:28:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5165056. Throughput: 0: 222.5. Samples: 287864. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:33,180][00194] Avg episode reward: [(0, '26.852')] [2024-09-01 16:28:38,180][00194] Fps is (10 sec: 1228.4, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5169152. Throughput: 0: 227.6. Samples: 289194. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:38,187][00194] Avg episode reward: [(0, '26.862')] [2024-09-01 16:28:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5173248. Throughput: 0: 220.3. Samples: 290496. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:28:43,184][00194] Avg episode reward: [(0, '27.232')] [2024-09-01 16:28:48,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5177344. Throughput: 0: 224.7. Samples: 291172. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:48,185][00194] Avg episode reward: [(0, '27.631')] [2024-09-01 16:28:50,752][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001265_5181440.pth... [2024-09-01 16:28:50,864][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001212_4964352.pth [2024-09-01 16:28:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5181440. Throughput: 0: 233.2. Samples: 292770. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:28:53,184][00194] Avg episode reward: [(0, '27.701')] [2024-09-01 16:28:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5185536. Throughput: 0: 228.1. Samples: 294226. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:28:58,180][00194] Avg episode reward: [(0, '27.442')] [2024-09-01 16:29:03,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5189632. Throughput: 0: 218.4. Samples: 294512. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:03,189][00194] Avg episode reward: [(0, '27.442')] [2024-09-01 16:29:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5193728. Throughput: 0: 223.6. Samples: 295986. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:08,179][00194] Avg episode reward: [(0, '28.054')] [2024-09-01 16:29:13,177][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5201920. Throughput: 0: 227.2. Samples: 297374. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:13,184][00194] Avg episode reward: [(0, '27.852')] [2024-09-01 16:29:17,923][26015] Updated weights for policy 0, policy_version 1271 (0.1976) [2024-09-01 16:29:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5206016. Throughput: 0: 228.8. Samples: 298162. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:18,181][00194] Avg episode reward: [(0, '28.579')] [2024-09-01 16:29:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5210112. Throughput: 0: 227.1. Samples: 299414. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:23,179][00194] Avg episode reward: [(0, '28.837')] [2024-09-01 16:29:26,920][26002] Saving new best policy, reward=28.837! [2024-09-01 16:29:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5214208. Throughput: 0: 232.1. Samples: 300942. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:28,180][00194] Avg episode reward: [(0, '29.127')] [2024-09-01 16:29:30,759][26002] Saving new best policy, reward=29.127! [2024-09-01 16:29:33,183][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5218304. Throughput: 0: 231.8. Samples: 301604. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:33,186][00194] Avg episode reward: [(0, '29.376')] [2024-09-01 16:29:36,180][26002] Saving new best policy, reward=29.376! [2024-09-01 16:29:38,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5222400. Throughput: 0: 218.9. Samples: 302620. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:29:38,183][00194] Avg episode reward: [(0, '29.563')] [2024-09-01 16:29:41,129][26002] Saving new best policy, reward=29.563! [2024-09-01 16:29:43,177][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5226496. Throughput: 0: 223.8. Samples: 304298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:29:43,185][00194] Avg episode reward: [(0, '28.905')] [2024-09-01 16:29:48,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5230592. Throughput: 0: 232.2. Samples: 304960. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:29:48,185][00194] Avg episode reward: [(0, '29.316')] [2024-09-01 16:29:53,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5234688. Throughput: 0: 227.9. Samples: 306240. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:53,184][00194] Avg episode reward: [(0, '29.042')] [2024-09-01 16:29:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5238784. Throughput: 0: 223.7. Samples: 307440. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:29:58,180][00194] Avg episode reward: [(0, '28.826')] [2024-09-01 16:30:03,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5242880. Throughput: 0: 223.8. Samples: 308232. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:30:03,184][00194] Avg episode reward: [(0, '28.151')] [2024-09-01 16:30:03,523][26015] Updated weights for policy 0, policy_version 1281 (0.0552) [2024-09-01 16:30:05,793][26002] Signal inference workers to stop experience collection... (300 times) [2024-09-01 16:30:05,836][26015] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-09-01 16:30:07,287][26002] Signal inference workers to resume experience collection... (300 times) [2024-09-01 16:30:07,289][26015] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-09-01 16:30:08,177][00194] Fps is (10 sec: 1228.7, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5251072. Throughput: 0: 227.9. Samples: 309668. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:08,180][00194] Avg episode reward: [(0, '28.279')] [2024-09-01 16:30:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5255168. Throughput: 0: 207.8. Samples: 310294. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:13,180][00194] Avg episode reward: [(0, '28.296')] [2024-09-01 16:30:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5259264. Throughput: 0: 217.1. Samples: 311372. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:18,187][00194] Avg episode reward: [(0, '27.743')] [2024-09-01 16:30:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5263360. Throughput: 0: 231.5. Samples: 313036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:23,185][00194] Avg episode reward: [(0, '27.846')] [2024-09-01 16:30:28,180][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5267456. Throughput: 0: 224.3. Samples: 314390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:28,182][00194] Avg episode reward: [(0, '27.703')] [2024-09-01 16:30:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5271552. Throughput: 0: 220.7. Samples: 314890. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:33,185][00194] Avg episode reward: [(0, '27.890')] [2024-09-01 16:30:38,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5275648. Throughput: 0: 226.4. Samples: 316426. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:38,180][00194] Avg episode reward: [(0, '27.052')] [2024-09-01 16:30:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5279744. Throughput: 0: 231.5. Samples: 317856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:30:43,186][00194] Avg episode reward: [(0, '26.846')] [2024-09-01 16:30:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5283840. Throughput: 0: 232.7. Samples: 318702. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:48,179][00194] Avg episode reward: [(0, '26.407')] [2024-09-01 16:30:48,655][26015] Updated weights for policy 0, policy_version 1291 (0.2201) [2024-09-01 16:30:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5287936. Throughput: 0: 224.4. Samples: 319768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:53,186][00194] Avg episode reward: [(0, '26.771')] [2024-09-01 16:30:53,484][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_5292032.pth... [2024-09-01 16:30:53,591][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001239_5074944.pth [2024-09-01 16:30:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5296128. Throughput: 0: 241.4. Samples: 321156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:30:58,180][00194] Avg episode reward: [(0, '26.717')] [2024-09-01 16:31:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5300224. Throughput: 0: 234.9. Samples: 321944. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:03,179][00194] Avg episode reward: [(0, '26.465')] [2024-09-01 16:31:08,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5304320. Throughput: 0: 224.1. Samples: 323120. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:08,182][00194] Avg episode reward: [(0, '26.401')] [2024-09-01 16:31:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5308416. Throughput: 0: 229.1. Samples: 324698. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:13,182][00194] Avg episode reward: [(0, '26.210')] [2024-09-01 16:31:18,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5312512. Throughput: 0: 232.5. Samples: 325354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:18,185][00194] Avg episode reward: [(0, '25.831')] [2024-09-01 16:31:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5316608. Throughput: 0: 233.2. Samples: 326918. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:31:23,180][00194] Avg episode reward: [(0, '25.144')] [2024-09-01 16:31:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5320704. Throughput: 0: 224.7. Samples: 327966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:31:28,180][00194] Avg episode reward: [(0, '25.316')] [2024-09-01 16:31:33,076][26015] Updated weights for policy 0, policy_version 1301 (0.0550) [2024-09-01 16:31:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5328896. Throughput: 0: 220.6. Samples: 328630. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:33,179][00194] Avg episode reward: [(0, '25.962')] [2024-09-01 16:31:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5332992. Throughput: 0: 230.1. Samples: 330122. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:38,180][00194] Avg episode reward: [(0, '25.709')] [2024-09-01 16:31:43,181][00194] Fps is (10 sec: 818.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5337088. Throughput: 0: 221.4. Samples: 331122. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:43,191][00194] Avg episode reward: [(0, '25.805')] [2024-09-01 16:31:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5341184. Throughput: 0: 223.1. Samples: 331982. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:48,179][00194] Avg episode reward: [(0, '26.897')] [2024-09-01 16:31:53,177][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5345280. Throughput: 0: 233.8. Samples: 333642. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:31:53,186][00194] Avg episode reward: [(0, '27.249')] [2024-09-01 16:31:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5349376. Throughput: 0: 231.3. Samples: 335106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:31:58,184][00194] Avg episode reward: [(0, '26.684')] [2024-09-01 16:32:03,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5353472. Throughput: 0: 224.4. Samples: 335450. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:03,189][00194] Avg episode reward: [(0, '26.436')] [2024-09-01 16:32:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5357568. Throughput: 0: 223.6. Samples: 336978. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:08,180][00194] Avg episode reward: [(0, '26.042')] [2024-09-01 16:32:13,177][00194] Fps is (10 sec: 1228.9, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5365760. Throughput: 0: 231.1. Samples: 338364. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:13,180][00194] Avg episode reward: [(0, '26.699')] [2024-09-01 16:32:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5365760. Throughput: 0: 234.7. Samples: 339192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:18,180][00194] Avg episode reward: [(0, '26.955')] [2024-09-01 16:32:18,587][26015] Updated weights for policy 0, policy_version 1311 (0.1637) [2024-09-01 16:32:23,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5369856. Throughput: 0: 227.7. Samples: 340368. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:32:23,186][00194] Avg episode reward: [(0, '26.630')] [2024-09-01 16:32:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5378048. Throughput: 0: 236.6. Samples: 341768. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:28,183][00194] Avg episode reward: [(0, '26.073')] [2024-09-01 16:32:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5382144. Throughput: 0: 231.7. Samples: 342408. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:33,180][00194] Avg episode reward: [(0, '25.973')] [2024-09-01 16:32:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5386240. Throughput: 0: 220.3. Samples: 343554. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:32:38,180][00194] Avg episode reward: [(0, '25.922')] [2024-09-01 16:32:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5390336. Throughput: 0: 221.9. Samples: 345090. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:43,180][00194] Avg episode reward: [(0, '26.415')] [2024-09-01 16:32:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5394432. Throughput: 0: 224.7. Samples: 345560. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:48,180][00194] Avg episode reward: [(0, '25.910')] [2024-09-01 16:32:49,251][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001318_5398528.pth... [2024-09-01 16:32:49,362][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001265_5181440.pth [2024-09-01 16:32:53,180][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5398528. Throughput: 0: 227.6. Samples: 347220. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:53,185][00194] Avg episode reward: [(0, '25.335')] [2024-09-01 16:32:58,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5402624. Throughput: 0: 221.5. Samples: 348330. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:32:58,186][00194] Avg episode reward: [(0, '25.732')] [2024-09-01 16:33:03,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5406720. Throughput: 0: 217.0. Samples: 348956. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:03,183][00194] Avg episode reward: [(0, '25.498')] [2024-09-01 16:33:03,571][26015] Updated weights for policy 0, policy_version 1321 (0.1594) [2024-09-01 16:33:08,178][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5414912. Throughput: 0: 227.3. Samples: 350596. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:08,188][00194] Avg episode reward: [(0, '25.190')] [2024-09-01 16:33:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5419008. Throughput: 0: 219.7. Samples: 351656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:13,183][00194] Avg episode reward: [(0, '25.469')] [2024-09-01 16:33:18,177][00194] Fps is (10 sec: 819.3, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5423104. Throughput: 0: 223.4. Samples: 352462. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:18,187][00194] Avg episode reward: [(0, '26.090')] [2024-09-01 16:33:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5427200. Throughput: 0: 226.1. Samples: 353730. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:23,180][00194] Avg episode reward: [(0, '26.379')] [2024-09-01 16:33:28,180][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5431296. Throughput: 0: 229.7. Samples: 355428. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:33:28,182][00194] Avg episode reward: [(0, '26.392')] [2024-09-01 16:33:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5435392. Throughput: 0: 227.3. Samples: 355790. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:33:33,185][00194] Avg episode reward: [(0, '25.307')] [2024-09-01 16:33:38,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5439488. Throughput: 0: 224.9. Samples: 357338. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:33:38,190][00194] Avg episode reward: [(0, '25.175')] [2024-09-01 16:33:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5443584. Throughput: 0: 233.1. Samples: 358820. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:33:43,186][00194] Avg episode reward: [(0, '24.816')] [2024-09-01 16:33:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5447680. Throughput: 0: 236.6. Samples: 359604. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:33:48,180][00194] Avg episode reward: [(0, '24.875')] [2024-09-01 16:33:48,923][26015] Updated weights for policy 0, policy_version 1331 (0.1040) [2024-09-01 16:33:52,544][26002] Signal inference workers to stop experience collection... (350 times) [2024-09-01 16:33:52,592][26015] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-09-01 16:33:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5451776. Throughput: 0: 224.4. Samples: 360694. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:33:53,186][00194] Avg episode reward: [(0, '24.882')] [2024-09-01 16:33:53,817][26002] Signal inference workers to resume experience collection... (350 times) [2024-09-01 16:33:53,818][26015] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-09-01 16:33:58,178][00194] Fps is (10 sec: 1228.6, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5459968. Throughput: 0: 233.2. Samples: 362152. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:33:58,184][00194] Avg episode reward: [(0, '25.407')] [2024-09-01 16:34:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5464064. Throughput: 0: 232.4. Samples: 362920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:03,187][00194] Avg episode reward: [(0, '24.828')] [2024-09-01 16:34:08,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5468160. Throughput: 0: 227.6. Samples: 363972. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:08,182][00194] Avg episode reward: [(0, '24.905')] [2024-09-01 16:34:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5472256. Throughput: 0: 218.9. Samples: 365276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:34:13,179][00194] Avg episode reward: [(0, '25.595')] [2024-09-01 16:34:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5476352. Throughput: 0: 229.0. Samples: 366094. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:34:18,181][00194] Avg episode reward: [(0, '25.096')] [2024-09-01 16:34:23,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5480448. Throughput: 0: 229.7. Samples: 367674. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:23,182][00194] Avg episode reward: [(0, '24.877')] [2024-09-01 16:34:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5484544. Throughput: 0: 220.9. Samples: 368760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:34:28,179][00194] Avg episode reward: [(0, '25.541')] [2024-09-01 16:34:33,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5488640. Throughput: 0: 214.0. Samples: 369236. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) [2024-09-01 16:34:33,180][00194] Avg episode reward: [(0, '25.725')] [2024-09-01 16:34:33,851][26015] Updated weights for policy 0, policy_version 1341 (0.1523) [2024-09-01 16:34:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5496832. Throughput: 0: 229.5. Samples: 371022. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:34:38,180][00194] Avg episode reward: [(0, '25.720')] [2024-09-01 16:34:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5496832. Throughput: 0: 221.5. Samples: 372120. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:34:43,179][00194] Avg episode reward: [(0, '25.858')] [2024-09-01 16:34:48,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5500928. Throughput: 0: 219.4. Samples: 372794. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) [2024-09-01 16:34:48,180][00194] Avg episode reward: [(0, '26.018')] [2024-09-01 16:34:52,017][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth... [2024-09-01 16:34:52,131][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001292_5292032.pth [2024-09-01 16:34:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5509120. Throughput: 0: 227.4. Samples: 374206. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:34:53,180][00194] Avg episode reward: [(0, '26.609')] [2024-09-01 16:34:58,178][00194] Fps is (10 sec: 1228.7, 60 sec: 887.5, 300 sec: 916.4). Total num frames: 5513216. Throughput: 0: 231.2. Samples: 375680. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:34:58,190][00194] Avg episode reward: [(0, '26.783')] [2024-09-01 16:35:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5517312. Throughput: 0: 226.6. Samples: 376292. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:03,183][00194] Avg episode reward: [(0, '26.419')] [2024-09-01 16:35:08,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5521408. Throughput: 0: 214.1. Samples: 377310. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:08,186][00194] Avg episode reward: [(0, '26.419')] [2024-09-01 16:35:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5525504. Throughput: 0: 234.5. Samples: 379312. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:13,183][00194] Avg episode reward: [(0, '27.281')] [2024-09-01 16:35:18,179][00194] Fps is (10 sec: 819.0, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5529600. Throughput: 0: 232.7. Samples: 379710. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:18,184][00194] Avg episode reward: [(0, '26.924')] [2024-09-01 16:35:20,389][26015] Updated weights for policy 0, policy_version 1351 (0.0541) [2024-09-01 16:35:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5533696. Throughput: 0: 215.7. Samples: 380728. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:23,181][00194] Avg episode reward: [(0, '26.941')] [2024-09-01 16:35:28,177][00194] Fps is (10 sec: 819.4, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5537792. Throughput: 0: 226.6. Samples: 382318. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:28,186][00194] Avg episode reward: [(0, '26.888')] [2024-09-01 16:35:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5545984. Throughput: 0: 229.3. Samples: 383112. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:33,181][00194] Avg episode reward: [(0, '27.257')] [2024-09-01 16:35:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 902.5). Total num frames: 5545984. Throughput: 0: 225.2. Samples: 384342. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:35:38,180][00194] Avg episode reward: [(0, '27.257')] [2024-09-01 16:35:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5554176. Throughput: 0: 217.3. Samples: 385458. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:43,179][00194] Avg episode reward: [(0, '27.668')] [2024-09-01 16:35:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5558272. Throughput: 0: 224.6. Samples: 386400. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:48,180][00194] Avg episode reward: [(0, '27.325')] [2024-09-01 16:35:53,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5558272. Throughput: 0: 225.4. Samples: 387454. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:53,185][00194] Avg episode reward: [(0, '27.561')] [2024-09-01 16:35:58,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5562368. Throughput: 0: 181.6. Samples: 387486. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:35:58,180][00194] Avg episode reward: [(0, '27.645')] [2024-09-01 16:36:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5566464. Throughput: 0: 193.9. Samples: 388434. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:36:03,184][00194] Avg episode reward: [(0, '27.329')] [2024-09-01 16:36:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5570560. Throughput: 0: 204.4. Samples: 389926. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:08,184][00194] Avg episode reward: [(0, '26.799')] [2024-09-01 16:36:10,656][26015] Updated weights for policy 0, policy_version 1361 (0.3310) [2024-09-01 16:36:13,178][00194] Fps is (10 sec: 819.1, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5574656. Throughput: 0: 200.3. Samples: 391330. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:13,181][00194] Avg episode reward: [(0, '26.127')] [2024-09-01 16:36:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5578752. Throughput: 0: 191.1. Samples: 391712. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:18,181][00194] Avg episode reward: [(0, '26.861')] [2024-09-01 16:36:23,177][00194] Fps is (10 sec: 819.3, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5582848. Throughput: 0: 193.0. Samples: 393028. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:23,179][00194] Avg episode reward: [(0, '26.413')] [2024-09-01 16:36:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 5586944. Throughput: 0: 204.6. Samples: 394666. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:36:28,185][00194] Avg episode reward: [(0, '26.205')] [2024-09-01 16:36:33,180][00194] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 874.7). Total num frames: 5591040. Throughput: 0: 203.0. Samples: 395536. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:36:33,183][00194] Avg episode reward: [(0, '26.133')] [2024-09-01 16:36:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.8). Total num frames: 5595136. Throughput: 0: 201.6. Samples: 396526. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:38,180][00194] Avg episode reward: [(0, '26.493')] [2024-09-01 16:36:43,177][00194] Fps is (10 sec: 1229.2, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5603328. Throughput: 0: 212.4. Samples: 397046. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:43,180][00194] Avg episode reward: [(0, '26.270')] [2024-09-01 16:36:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 888.6). Total num frames: 5607424. Throughput: 0: 229.1. Samples: 398742. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:48,184][00194] Avg episode reward: [(0, '26.885')] [2024-09-01 16:36:52,435][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001370_5611520.pth... [2024-09-01 16:36:52,584][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001318_5398528.pth [2024-09-01 16:36:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5611520. Throughput: 0: 219.0. Samples: 399782. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:53,184][00194] Avg episode reward: [(0, '27.118')] [2024-09-01 16:36:57,312][26015] Updated weights for policy 0, policy_version 1371 (0.2019) [2024-09-01 16:36:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5615616. Throughput: 0: 217.3. Samples: 401110. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:36:58,179][00194] Avg episode reward: [(0, '27.157')] [2024-09-01 16:37:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5619712. Throughput: 0: 224.1. Samples: 401796. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:03,185][00194] Avg episode reward: [(0, '27.084')] [2024-09-01 16:37:08,179][00194] Fps is (10 sec: 819.1, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 5623808. Throughput: 0: 224.3. Samples: 403122. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:08,182][00194] Avg episode reward: [(0, '26.167')] [2024-09-01 16:37:13,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5627904. Throughput: 0: 213.5. Samples: 404274. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:13,182][00194] Avg episode reward: [(0, '26.559')] [2024-09-01 16:37:18,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5632000. Throughput: 0: 211.1. Samples: 405036. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:18,188][00194] Avg episode reward: [(0, '26.834')] [2024-09-01 16:37:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5636096. Throughput: 0: 227.5. Samples: 406764. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:23,185][00194] Avg episode reward: [(0, '26.512')] [2024-09-01 16:37:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5640192. Throughput: 0: 240.3. Samples: 407858. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:28,180][00194] Avg episode reward: [(0, '26.180')] [2024-09-01 16:37:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5644288. Throughput: 0: 214.3. Samples: 408384. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:37:33,182][00194] Avg episode reward: [(0, '26.672')] [2024-09-01 16:37:38,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5652480. Throughput: 0: 225.7. Samples: 409938. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:38,180][00194] Avg episode reward: [(0, '26.354')] [2024-09-01 16:37:41,426][26015] Updated weights for policy 0, policy_version 1381 (0.1659) [2024-09-01 16:37:43,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5656576. Throughput: 0: 216.3. Samples: 410844. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:43,179][00194] Avg episode reward: [(0, '25.707')] [2024-09-01 16:37:45,197][26002] Signal inference workers to stop experience collection... (400 times) [2024-09-01 16:37:45,287][26015] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-09-01 16:37:46,980][26002] Signal inference workers to resume experience collection... (400 times) [2024-09-01 16:37:46,982][26015] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-09-01 16:37:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5660672. Throughput: 0: 226.3. Samples: 411978. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:48,181][00194] Avg episode reward: [(0, '25.427')] [2024-09-01 16:37:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5664768. Throughput: 0: 223.8. Samples: 413194. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:53,180][00194] Avg episode reward: [(0, '25.146')] [2024-09-01 16:37:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5668864. Throughput: 0: 237.0. Samples: 414940. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:37:58,180][00194] Avg episode reward: [(0, '25.685')] [2024-09-01 16:38:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5672960. Throughput: 0: 232.2. Samples: 415484. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:38:03,180][00194] Avg episode reward: [(0, '26.396')] [2024-09-01 16:38:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5677056. Throughput: 0: 216.0. Samples: 416486. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:08,180][00194] Avg episode reward: [(0, '25.876')] [2024-09-01 16:38:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5681152. Throughput: 0: 228.4. Samples: 418138. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:13,184][00194] Avg episode reward: [(0, '26.671')] [2024-09-01 16:38:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5689344. Throughput: 0: 236.4. Samples: 419024. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:18,180][00194] Avg episode reward: [(0, '27.117')] [2024-09-01 16:38:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5689344. Throughput: 0: 224.5. Samples: 420042. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:23,180][00194] Avg episode reward: [(0, '27.637')] [2024-09-01 16:38:28,177][00194] Fps is (10 sec: 409.6, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5693440. Throughput: 0: 233.0. Samples: 421328. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2024-09-01 16:38:28,188][00194] Avg episode reward: [(0, '27.652')] [2024-09-01 16:38:28,341][26015] Updated weights for policy 0, policy_version 1391 (0.1704) [2024-09-01 16:38:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5701632. Throughput: 0: 229.3. Samples: 422298. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:33,186][00194] Avg episode reward: [(0, '27.748')] [2024-09-01 16:38:38,178][00194] Fps is (10 sec: 1228.6, 60 sec: 887.4, 300 sec: 888.6). Total num frames: 5705728. Throughput: 0: 231.5. Samples: 423610. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:38,182][00194] Avg episode reward: [(0, '27.597')] [2024-09-01 16:38:43,178][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5709824. Throughput: 0: 208.8. Samples: 424336. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:43,183][00194] Avg episode reward: [(0, '27.989')] [2024-09-01 16:38:48,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5713920. Throughput: 0: 221.0. Samples: 425428. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:48,185][00194] Avg episode reward: [(0, '28.276')] [2024-09-01 16:38:50,276][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001396_5718016.pth... [2024-09-01 16:38:50,382][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth [2024-09-01 16:38:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5718016. Throughput: 0: 240.1. Samples: 427292. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:53,183][00194] Avg episode reward: [(0, '27.585')] [2024-09-01 16:38:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5722112. Throughput: 0: 229.6. Samples: 428470. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2024-09-01 16:38:58,180][00194] Avg episode reward: [(0, '27.747')] [2024-09-01 16:39:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5726208. Throughput: 0: 217.2. Samples: 428796. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:39:03,182][00194] Avg episode reward: [(0, '27.350')] [2024-09-01 16:39:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5730304. Throughput: 0: 235.5. Samples: 430640. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2024-09-01 16:39:08,180][00194] Avg episode reward: [(0, '26.992')] [2024-09-01 16:39:12,822][26015] Updated weights for policy 0, policy_version 1401 (0.1187) [2024-09-01 16:39:13,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5738496. Throughput: 0: 231.6. Samples: 431750. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:39:13,186][00194] Avg episode reward: [(0, '27.130')] [2024-09-01 16:39:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 5738496. Throughput: 0: 226.0. Samples: 432468. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:39:18,181][00194] Avg episode reward: [(0, '27.574')] [2024-09-01 16:39:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5746688. Throughput: 0: 223.6. Samples: 433670. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:23,186][00194] Avg episode reward: [(0, '27.670')] [2024-09-01 16:39:28,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5750784. Throughput: 0: 244.6. Samples: 435342. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:28,187][00194] Avg episode reward: [(0, '28.205')] [2024-09-01 16:39:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5754880. Throughput: 0: 232.3. Samples: 435880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:33,186][00194] Avg episode reward: [(0, '28.126')] [2024-09-01 16:39:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5758976. Throughput: 0: 212.3. Samples: 436844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:38,180][00194] Avg episode reward: [(0, '28.321')] [2024-09-01 16:39:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5763072. Throughput: 0: 222.5. Samples: 438484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) [2024-09-01 16:39:43,186][00194] Avg episode reward: [(0, '28.215')] [2024-09-01 16:39:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5767168. Throughput: 0: 229.5. Samples: 439124. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:48,180][00194] Avg episode reward: [(0, '28.771')] [2024-09-01 16:39:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5771264. Throughput: 0: 221.1. Samples: 440588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:53,184][00194] Avg episode reward: [(0, '29.294')] [2024-09-01 16:39:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5775360. Throughput: 0: 226.7. Samples: 441952. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:39:58,180][00194] Avg episode reward: [(0, '29.630')] [2024-09-01 16:39:59,619][26015] Updated weights for policy 0, policy_version 1411 (0.2575) [2024-09-01 16:40:03,032][26002] Saving new best policy, reward=29.630! [2024-09-01 16:40:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5783552. Throughput: 0: 224.9. Samples: 442588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:03,180][00194] Avg episode reward: [(0, '29.775')] [2024-09-01 16:40:07,874][26002] Saving new best policy, reward=29.775! [2024-09-01 16:40:08,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5787648. Throughput: 0: 230.4. Samples: 444036. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:08,184][00194] Avg episode reward: [(0, '29.622')] [2024-09-01 16:40:13,177][00194] Fps is (10 sec: 409.6, 60 sec: 819.2, 300 sec: 874.7). Total num frames: 5787648. Throughput: 0: 215.9. Samples: 445056. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:13,182][00194] Avg episode reward: [(0, '29.296')] [2024-09-01 16:40:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 888.6). Total num frames: 5795840. Throughput: 0: 225.8. Samples: 446042. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:18,180][00194] Avg episode reward: [(0, '28.875')] [2024-09-01 16:40:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5799936. Throughput: 0: 235.5. Samples: 447442. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:23,179][00194] Avg episode reward: [(0, '29.872')] [2024-09-01 16:40:25,841][26002] Saving new best policy, reward=29.872! [2024-09-01 16:40:28,181][00194] Fps is (10 sec: 818.9, 60 sec: 887.4, 300 sec: 874.7). Total num frames: 5804032. Throughput: 0: 226.6. Samples: 448684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:28,184][00194] Avg episode reward: [(0, '29.518')] [2024-09-01 16:40:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5808128. Throughput: 0: 223.5. Samples: 449180. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:33,179][00194] Avg episode reward: [(0, '29.178')] [2024-09-01 16:40:38,177][00194] Fps is (10 sec: 819.5, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5812224. Throughput: 0: 228.6. Samples: 450874. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:38,181][00194] Avg episode reward: [(0, '29.242')] [2024-09-01 16:40:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 874.7). Total num frames: 5816320. Throughput: 0: 228.0. Samples: 452210. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:40:43,180][00194] Avg episode reward: [(0, '29.231')] [2024-09-01 16:40:44,210][26015] Updated weights for policy 0, policy_version 1421 (0.0544) [2024-09-01 16:40:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5820416. Throughput: 0: 223.9. Samples: 452662. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:40:48,186][00194] Avg episode reward: [(0, '29.310')] [2024-09-01 16:40:49,709][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001422_5824512.pth... [2024-09-01 16:40:49,817][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001370_5611520.pth [2024-09-01 16:40:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5824512. Throughput: 0: 227.0. Samples: 454250. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:40:53,180][00194] Avg episode reward: [(0, '29.215')] [2024-09-01 16:40:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5832704. Throughput: 0: 229.6. Samples: 455390. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:40:58,187][00194] Avg episode reward: [(0, '29.728')] [2024-09-01 16:41:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5836800. Throughput: 0: 226.2. Samples: 456220. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:03,183][00194] Avg episode reward: [(0, '29.735')] [2024-09-01 16:41:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5840896. Throughput: 0: 218.3. Samples: 457266. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:08,188][00194] Avg episode reward: [(0, '29.735')] [2024-09-01 16:41:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5844992. Throughput: 0: 227.1. Samples: 458904. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:13,180][00194] Avg episode reward: [(0, '29.259')] [2024-09-01 16:41:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5849088. Throughput: 0: 231.3. Samples: 459588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:18,180][00194] Avg episode reward: [(0, '30.461')] [2024-09-01 16:41:20,439][26002] Saving new best policy, reward=30.461! [2024-09-01 16:41:23,183][00194] Fps is (10 sec: 818.7, 60 sec: 887.4, 300 sec: 902.5). Total num frames: 5853184. Throughput: 0: 220.0. Samples: 460776. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:23,186][00194] Avg episode reward: [(0, '30.677')] [2024-09-01 16:41:25,831][26002] Saving new best policy, reward=30.677! [2024-09-01 16:41:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5857280. Throughput: 0: 225.9. Samples: 462376. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:28,180][00194] Avg episode reward: [(0, '31.919')] [2024-09-01 16:41:30,117][26015] Updated weights for policy 0, policy_version 1431 (0.1548) [2024-09-01 16:41:32,499][26002] Signal inference workers to stop experience collection... (450 times) [2024-09-01 16:41:32,562][26015] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-09-01 16:41:33,177][00194] Fps is (10 sec: 819.7, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5861376. Throughput: 0: 226.8. Samples: 462868. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:33,185][00194] Avg episode reward: [(0, '32.206')] [2024-09-01 16:41:33,431][26002] Saving new best policy, reward=31.919! [2024-09-01 16:41:33,433][26002] Signal inference workers to resume experience collection... (450 times) [2024-09-01 16:41:33,443][26015] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-09-01 16:41:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5865472. Throughput: 0: 227.5. Samples: 464488. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:38,180][00194] Avg episode reward: [(0, '31.630')] [2024-09-01 16:41:38,667][26002] Saving new best policy, reward=32.206! [2024-09-01 16:41:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5869568. Throughput: 0: 225.4. Samples: 465532. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:43,190][00194] Avg episode reward: [(0, '31.349')] [2024-09-01 16:41:48,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5877760. Throughput: 0: 227.9. Samples: 466474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) [2024-09-01 16:41:48,180][00194] Avg episode reward: [(0, '30.458')] [2024-09-01 16:41:53,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5881856. Throughput: 0: 232.0. Samples: 467704. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:41:53,180][00194] Avg episode reward: [(0, '30.334')] [2024-09-01 16:41:58,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5885952. Throughput: 0: 221.0. Samples: 468848. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:41:58,185][00194] Avg episode reward: [(0, '30.870')] [2024-09-01 16:42:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5890048. Throughput: 0: 224.7. Samples: 469698. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:03,180][00194] Avg episode reward: [(0, '30.500')] [2024-09-01 16:42:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5894144. Throughput: 0: 233.7. Samples: 471290. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:08,187][00194] Avg episode reward: [(0, '30.023')] [2024-09-01 16:42:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5898240. Throughput: 0: 224.9. Samples: 472496. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:13,182][00194] Avg episode reward: [(0, '29.890')] [2024-09-01 16:42:15,935][26015] Updated weights for policy 0, policy_version 1441 (0.2283) [2024-09-01 16:42:18,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5902336. Throughput: 0: 221.2. Samples: 472820. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:18,180][00194] Avg episode reward: [(0, '29.184')] [2024-09-01 16:42:23,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.6, 300 sec: 902.5). Total num frames: 5906432. Throughput: 0: 226.5. Samples: 474682. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:23,183][00194] Avg episode reward: [(0, '29.022')] [2024-09-01 16:42:28,182][00194] Fps is (10 sec: 1228.2, 60 sec: 955.6, 300 sec: 916.4). Total num frames: 5914624. Throughput: 0: 234.3. Samples: 476078. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:28,185][00194] Avg episode reward: [(0, '30.580')] [2024-09-01 16:42:33,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5918720. Throughput: 0: 229.8. Samples: 476814. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:33,179][00194] Avg episode reward: [(0, '30.606')] [2024-09-01 16:42:38,177][00194] Fps is (10 sec: 819.6, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5922816. Throughput: 0: 226.2. Samples: 477882. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:38,184][00194] Avg episode reward: [(0, '30.535')] [2024-09-01 16:42:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 955.7, 300 sec: 902.5). Total num frames: 5926912. Throughput: 0: 236.6. Samples: 479494. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:43,180][00194] Avg episode reward: [(0, '30.297')] [2024-09-01 16:42:48,178][00194] Fps is (10 sec: 819.1, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5931008. Throughput: 0: 233.8. Samples: 480220. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:48,186][00194] Avg episode reward: [(0, '29.842')] [2024-09-01 16:42:50,977][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001449_5935104.pth... [2024-09-01 16:42:51,095][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001396_5718016.pth [2024-09-01 16:42:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5935104. Throughput: 0: 221.6. Samples: 481264. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:42:53,180][00194] Avg episode reward: [(0, '29.402')] [2024-09-01 16:42:58,177][00194] Fps is (10 sec: 819.3, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5939200. Throughput: 0: 229.0. Samples: 482802. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:42:58,180][00194] Avg episode reward: [(0, '29.277')] [2024-09-01 16:42:59,731][26015] Updated weights for policy 0, policy_version 1451 (0.1034) [2024-09-01 16:43:03,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5943296. Throughput: 0: 241.0. Samples: 483664. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:43:03,182][00194] Avg episode reward: [(0, '30.124')] [2024-09-01 16:43:08,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5947392. Throughput: 0: 225.6. Samples: 484836. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) [2024-09-01 16:43:08,184][00194] Avg episode reward: [(0, '30.448')] [2024-09-01 16:43:13,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 888.6). Total num frames: 5951488. Throughput: 0: 222.8. Samples: 486104. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) [2024-09-01 16:43:13,184][00194] Avg episode reward: [(0, '30.254')] [2024-09-01 16:43:18,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5959680. Throughput: 0: 226.3. Samples: 486998. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) [2024-09-01 16:43:18,180][00194] Avg episode reward: [(0, '30.282')] [2024-09-01 16:43:23,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5963776. Throughput: 0: 234.2. Samples: 488422. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:43:23,181][00194] Avg episode reward: [(0, '30.728')] [2024-09-01 16:43:28,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5967872. Throughput: 0: 221.6. Samples: 489468. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2024-09-01 16:43:28,186][00194] Avg episode reward: [(0, '29.194')] [2024-09-01 16:43:33,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5971968. Throughput: 0: 221.4. Samples: 490182. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:33,185][00194] Avg episode reward: [(0, '29.182')] [2024-09-01 16:43:38,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5976064. Throughput: 0: 230.0. Samples: 491614. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:38,180][00194] Avg episode reward: [(0, '29.130')] [2024-09-01 16:43:43,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5980160. Throughput: 0: 229.9. Samples: 493148. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:43,182][00194] Avg episode reward: [(0, '28.244')] [2024-09-01 16:43:44,795][26015] Updated weights for policy 0, policy_version 1461 (0.1507) [2024-09-01 16:43:48,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5984256. Throughput: 0: 217.7. Samples: 493462. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:48,179][00194] Avg episode reward: [(0, '28.026')] [2024-09-01 16:43:53,177][00194] Fps is (10 sec: 819.2, 60 sec: 887.5, 300 sec: 902.5). Total num frames: 5988352. Throughput: 0: 230.8. Samples: 495220. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2024-09-01 16:43:53,180][00194] Avg episode reward: [(0, '27.804')] [2024-09-01 16:43:58,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 5996544. Throughput: 0: 233.6. Samples: 496618. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:43:58,186][00194] Avg episode reward: [(0, '29.239')] [2024-09-01 16:44:03,177][00194] Fps is (10 sec: 1228.8, 60 sec: 955.7, 300 sec: 916.4). Total num frames: 6000640. Throughput: 0: 229.3. Samples: 497318. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2024-09-01 16:44:03,184][00194] Avg episode reward: [(0, '28.802')] [2024-09-01 16:44:07,573][26002] Stopping Batcher_0... [2024-09-01 16:44:07,574][26002] Loop batcher_evt_loop terminating... [2024-09-01 16:44:07,585][00194] Component Batcher_0 stopped! [2024-09-01 16:44:07,630][26015] Weights refcount: 2 0 [2024-09-01 16:44:07,635][00194] Component InferenceWorker_p0-w0 stopped! [2024-09-01 16:44:07,638][26015] Stopping InferenceWorker_p0-w0... [2024-09-01 16:44:07,642][26015] Loop inference_proc0-0_evt_loop terminating... [2024-09-01 16:44:08,076][26021] Stopping RolloutWorker_w5... [2024-09-01 16:44:08,086][26021] Loop rollout_proc5_evt_loop terminating... [2024-09-01 16:44:08,077][00194] Component RolloutWorker_w5 stopped! [2024-09-01 16:44:08,115][00194] Component RolloutWorker_w3 stopped! [2024-09-01 16:44:08,115][26019] Stopping RolloutWorker_w3... [2024-09-01 16:44:08,132][26019] Loop rollout_proc3_evt_loop terminating... [2024-09-01 16:44:08,133][26023] Stopping RolloutWorker_w7... [2024-09-01 16:44:08,141][26023] Loop rollout_proc7_evt_loop terminating... [2024-09-01 16:44:08,134][00194] Component RolloutWorker_w7 stopped! [2024-09-01 16:44:08,164][00194] Component RolloutWorker_w0 stopped! [2024-09-01 16:44:08,181][00194] Component RolloutWorker_w2 stopped! [2024-09-01 16:44:08,190][26017] Stopping RolloutWorker_w2... [2024-09-01 16:44:08,173][26016] Stopping RolloutWorker_w0... [2024-09-01 16:44:08,205][26016] Loop rollout_proc0_evt_loop terminating... [2024-09-01 16:44:08,208][26017] Loop rollout_proc2_evt_loop terminating... [2024-09-01 16:44:08,247][00194] Component RolloutWorker_w4 stopped! [2024-09-01 16:44:08,248][26020] Stopping RolloutWorker_w4... [2024-09-01 16:44:08,257][26020] Loop rollout_proc4_evt_loop terminating... [2024-09-01 16:44:08,276][26018] Stopping RolloutWorker_w1... [2024-09-01 16:44:08,276][00194] Component RolloutWorker_w1 stopped! [2024-09-01 16:44:08,276][26018] Loop rollout_proc1_evt_loop terminating... [2024-09-01 16:44:08,333][26022] Stopping RolloutWorker_w6... [2024-09-01 16:44:08,332][00194] Component RolloutWorker_w6 stopped! [2024-09-01 16:44:08,335][26022] Loop rollout_proc6_evt_loop terminating... [2024-09-01 16:44:12,546][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:44:12,616][26002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001422_5824512.pth [2024-09-01 16:44:12,624][26002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:44:12,714][26002] Stopping LearnerWorker_p0... [2024-09-01 16:44:12,715][26002] Loop learner_proc0_evt_loop terminating... [2024-09-01 16:44:12,715][00194] Component LearnerWorker_p0 stopped! [2024-09-01 16:44:12,718][00194] Waiting for process learner_proc0 to stop... [2024-09-01 16:44:13,178][00194] Waiting for process inference_proc0-0 to join... [2024-09-01 16:44:13,183][00194] Waiting for process rollout_proc0 to join... [2024-09-01 16:44:13,189][00194] Waiting for process rollout_proc1 to join... [2024-09-01 16:44:13,193][00194] Waiting for process rollout_proc2 to join... [2024-09-01 16:44:13,199][00194] Waiting for process rollout_proc3 to join... [2024-09-01 16:44:13,208][00194] Waiting for process rollout_proc4 to join... [2024-09-01 16:44:13,214][00194] Waiting for process rollout_proc5 to join... [2024-09-01 16:44:13,220][00194] Waiting for process rollout_proc6 to join... [2024-09-01 16:44:13,225][00194] Waiting for process rollout_proc7 to join... [2024-09-01 16:44:13,234][00194] Batcher 0 profile tree view: batching: 9.2162, releasing_batches: 0.1772 [2024-09-01 16:44:13,240][00194] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 30.9066 update_model: 81.0060 weight_update: 0.1005 one_step: 0.0376 handle_policy_step: 1436.1295 deserialize: 44.8616, stack: 7.2046, obs_to_device_normalize: 241.7567, forward: 1055.2233, send_messages: 32.3865 prepare_outputs: 17.1816 to_cpu: 1.7264 [2024-09-01 16:44:13,242][00194] Learner 0 profile tree view: misc: 0.0034, prepare_batch: 631.4118 train: 1567.9798 epoch_init: 0.0036, minibatch_init: 0.0053, losses_postprocess: 0.0786, kl_divergence: 0.2734, after_optimizer: 1.2242 calculate_losses: 757.4961 losses_init: 0.0022, forward_head: 673.4290, bptt_initial: 2.1597, tail: 1.6841, advantages_returns: 0.1136, losses: 0.8179 bptt: 79.0010 bptt_forward_core: 78.5114 update: 808.5816 clip: 1.8736 [2024-09-01 16:44:13,244][00194] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2782, enqueue_policy_requests: 28.3148, env_step: 831.1422, overhead: 20.8807, complete_rollouts: 8.1561 save_policy_outputs: 22.1874 split_output_tensors: 7.4809 [2024-09-01 16:44:13,247][00194] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3237, enqueue_policy_requests: 27.7500, env_step: 817.1254, overhead: 19.3453, complete_rollouts: 9.0290 save_policy_outputs: 21.3592 split_output_tensors: 7.1296 [2024-09-01 16:44:13,251][00194] Loop Runner_EvtLoop terminating... [2024-09-01 16:44:13,253][00194] Runner profile tree view: main_loop: 2242.7525 [2024-09-01 16:44:13,254][00194] Collected {0: 6008832}, FPS: 887.6 [2024-09-01 16:49:06,149][00194] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-01 16:49:06,153][00194] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-01 16:49:06,156][00194] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-01 16:49:06,159][00194] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-01 16:49:06,162][00194] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-01 16:49:06,165][00194] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-01 16:49:06,167][00194] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-01 16:49:06,170][00194] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-01 16:49:06,171][00194] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-01 16:49:06,173][00194] Adding new argument 'hf_repository'='jarski/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-01 16:49:06,174][00194] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-01 16:49:06,175][00194] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-01 16:49:06,176][00194] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-01 16:49:06,177][00194] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-01 16:49:06,180][00194] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-01 16:49:06,214][00194] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-01 16:49:06,218][00194] RunningMeanStd input shape: (3, 72, 128) [2024-09-01 16:49:06,223][00194] RunningMeanStd input shape: (1,) [2024-09-01 16:49:06,266][00194] ConvEncoder: input_channels=3 [2024-09-01 16:49:06,433][00194] Conv encoder output size: 512 [2024-09-01 16:49:06,435][00194] Policy head output size: 512 [2024-09-01 16:49:06,461][00194] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001467_6008832.pth... [2024-09-01 16:49:07,124][00194] Num frames 100... [2024-09-01 16:49:07,354][00194] Num frames 200... [2024-09-01 16:49:07,571][00194] Num frames 300... [2024-09-01 16:49:07,831][00194] Num frames 400... [2024-09-01 16:49:08,050][00194] Num frames 500... [2024-09-01 16:49:08,267][00194] Num frames 600... [2024-09-01 16:49:08,478][00194] Num frames 700... [2024-09-01 16:49:08,688][00194] Num frames 800... [2024-09-01 16:49:08,901][00194] Num frames 900... [2024-09-01 16:49:09,119][00194] Num frames 1000... [2024-09-01 16:49:09,328][00194] Avg episode rewards: #0: 23.710, true rewards: #0: 10.710 [2024-09-01 16:49:09,330][00194] Avg episode reward: 23.710, avg true_objective: 10.710 [2024-09-01 16:49:09,398][00194] Num frames 1100... [2024-09-01 16:49:09,614][00194] Num frames 1200... [2024-09-01 16:49:09,843][00194] Num frames 1300... [2024-09-01 16:49:10,075][00194] Num frames 1400... [2024-09-01 16:49:10,305][00194] Num frames 1500... [2024-09-01 16:49:10,511][00194] Avg episode rewards: #0: 16.355, true rewards: #0: 7.855 [2024-09-01 16:49:10,513][00194] Avg episode reward: 16.355, avg true_objective: 7.855 [2024-09-01 16:49:10,583][00194] Num frames 1600... [2024-09-01 16:49:10,803][00194] Num frames 1700... [2024-09-01 16:49:11,016][00194] Num frames 1800... [2024-09-01 16:49:11,226][00194] Num frames 1900... [2024-09-01 16:49:11,441][00194] Num frames 2000... [2024-09-01 16:49:11,660][00194] Num frames 2100... [2024-09-01 16:49:11,879][00194] Num frames 2200... [2024-09-01 16:49:12,121][00194] Num frames 2300... [2024-09-01 16:49:12,410][00194] Num frames 2400... [2024-09-01 16:49:12,701][00194] Num frames 2500... [2024-09-01 16:49:12,990][00194] Num frames 2600... [2024-09-01 16:49:13,269][00194] Num frames 2700... [2024-09-01 16:49:13,547][00194] Num frames 2800... [2024-09-01 16:49:13,834][00194] Num frames 2900... [2024-09-01 16:49:14,134][00194] Num frames 3000... [2024-09-01 16:49:14,217][00194] Avg episode rewards: #0: 23.360, true rewards: #0: 10.027 [2024-09-01 16:49:14,220][00194] Avg episode reward: 23.360, avg true_objective: 10.027 [2024-09-01 16:49:14,490][00194] Num frames 3100... [2024-09-01 16:49:14,772][00194] Num frames 3200... [2024-09-01 16:49:15,076][00194] Num frames 3300... [2024-09-01 16:49:15,334][00194] Num frames 3400... [2024-09-01 16:49:15,572][00194] Avg episode rewards: #0: 19.970, true rewards: #0: 8.720 [2024-09-01 16:49:15,574][00194] Avg episode reward: 19.970, avg true_objective: 8.720 [2024-09-01 16:49:15,604][00194] Num frames 3500... [2024-09-01 16:49:15,809][00194] Num frames 3600... [2024-09-01 16:49:16,017][00194] Num frames 3700... [2024-09-01 16:49:16,239][00194] Num frames 3800... [2024-09-01 16:49:16,451][00194] Num frames 3900... [2024-09-01 16:49:16,663][00194] Num frames 4000... [2024-09-01 16:49:16,786][00194] Avg episode rewards: #0: 17.864, true rewards: #0: 8.064 [2024-09-01 16:49:16,788][00194] Avg episode reward: 17.864, avg true_objective: 8.064 [2024-09-01 16:49:16,929][00194] Num frames 4100... [2024-09-01 16:49:17,154][00194] Num frames 4200... [2024-09-01 16:49:17,358][00194] Num frames 4300... [2024-09-01 16:49:17,559][00194] Num frames 4400... [2024-09-01 16:49:17,767][00194] Num frames 4500... [2024-09-01 16:49:17,925][00194] Avg episode rewards: #0: 16.073, true rewards: #0: 7.573 [2024-09-01 16:49:17,927][00194] Avg episode reward: 16.073, avg true_objective: 7.573 [2024-09-01 16:49:18,047][00194] Num frames 4600... [2024-09-01 16:49:18,282][00194] Num frames 4700... [2024-09-01 16:49:18,500][00194] Num frames 4800... [2024-09-01 16:49:18,719][00194] Num frames 4900... [2024-09-01 16:49:18,929][00194] Num frames 5000... [2024-09-01 16:49:19,154][00194] Num frames 5100... [2024-09-01 16:49:19,383][00194] Num frames 5200... [2024-09-01 16:49:19,621][00194] Num frames 5300... [2024-09-01 16:49:19,842][00194] Num frames 5400... [2024-09-01 16:49:20,061][00194] Num frames 5500... [2024-09-01 16:49:20,293][00194] Num frames 5600... [2024-09-01 16:49:20,497][00194] Num frames 5700... [2024-09-01 16:49:20,708][00194] Num frames 5800... [2024-09-01 16:49:20,915][00194] Num frames 5900... [2024-09-01 16:49:21,087][00194] Avg episode rewards: #0: 18.931, true rewards: #0: 8.503 [2024-09-01 16:49:21,091][00194] Avg episode reward: 18.931, avg true_objective: 8.503 [2024-09-01 16:49:21,191][00194] Num frames 6000... [2024-09-01 16:49:21,410][00194] Num frames 6100... [2024-09-01 16:49:21,625][00194] Num frames 6200... [2024-09-01 16:49:21,831][00194] Num frames 6300... [2024-09-01 16:49:22,044][00194] Num frames 6400... [2024-09-01 16:49:22,269][00194] Num frames 6500... [2024-09-01 16:49:22,490][00194] Num frames 6600... [2024-09-01 16:49:22,705][00194] Num frames 6700... [2024-09-01 16:49:22,943][00194] Num frames 6800... [2024-09-01 16:49:23,176][00194] Num frames 6900... [2024-09-01 16:49:23,417][00194] Num frames 7000... [2024-09-01 16:49:23,654][00194] Num frames 7100... [2024-09-01 16:49:23,871][00194] Num frames 7200... [2024-09-01 16:49:24,100][00194] Num frames 7300... [2024-09-01 16:49:24,317][00194] Num frames 7400... [2024-09-01 16:49:24,542][00194] Num frames 7500... [2024-09-01 16:49:24,760][00194] Avg episode rewards: #0: 21.964, true rewards: #0: 9.464 [2024-09-01 16:49:24,762][00194] Avg episode reward: 21.964, avg true_objective: 9.464 [2024-09-01 16:49:24,832][00194] Num frames 7600... [2024-09-01 16:49:25,063][00194] Num frames 7700... [2024-09-01 16:49:25,328][00194] Num frames 7800... [2024-09-01 16:49:25,643][00194] Num frames 7900... [2024-09-01 16:49:25,922][00194] Num frames 8000... [2024-09-01 16:49:26,197][00194] Num frames 8100... [2024-09-01 16:49:26,486][00194] Num frames 8200... [2024-09-01 16:49:26,807][00194] Num frames 8300... [2024-09-01 16:49:27,106][00194] Num frames 8400... [2024-09-01 16:49:27,399][00194] Num frames 8500... [2024-09-01 16:49:27,493][00194] Avg episode rewards: #0: 21.791, true rewards: #0: 9.458 [2024-09-01 16:49:27,497][00194] Avg episode reward: 21.791, avg true_objective: 9.458 [2024-09-01 16:49:27,755][00194] Num frames 8600... [2024-09-01 16:49:28,054][00194] Num frames 8700... [2024-09-01 16:49:28,371][00194] Num frames 8800... [2024-09-01 16:49:28,609][00194] Num frames 8900... [2024-09-01 16:49:28,835][00194] Num frames 9000... [2024-09-01 16:49:28,942][00194] Avg episode rewards: #0: 20.724, true rewards: #0: 9.024 [2024-09-01 16:49:28,943][00194] Avg episode reward: 20.724, avg true_objective: 9.024 [2024-09-01 16:50:30,694][00194] Replay video saved to /content/train_dir/default_experiment/replay.mp4!