[2024-12-30 00:30:26,609][01374] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-30 00:30:26,612][01374] Rollout worker 0 uses device cpu [2024-12-30 00:30:26,614][01374] Rollout worker 1 uses device cpu [2024-12-30 00:30:26,615][01374] Rollout worker 2 uses device cpu [2024-12-30 00:30:26,616][01374] Rollout worker 3 uses device cpu [2024-12-30 00:30:26,617][01374] Rollout worker 4 uses device cpu [2024-12-30 00:30:26,618][01374] Rollout worker 5 uses device cpu [2024-12-30 00:30:26,619][01374] Rollout worker 6 uses device cpu [2024-12-30 00:30:26,620][01374] Rollout worker 7 uses device cpu [2024-12-30 00:30:26,774][01374] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 00:30:26,775][01374] InferenceWorker_p0-w0: min num requests: 2 [2024-12-30 00:30:26,808][01374] Starting all processes... [2024-12-30 00:30:26,810][01374] Starting process learner_proc0 [2024-12-30 00:30:26,855][01374] Starting all processes... [2024-12-30 00:30:26,865][01374] Starting process inference_proc0-0 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc0 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc1 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc2 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc3 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc4 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc5 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc6 [2024-12-30 00:30:26,865][01374] Starting process rollout_proc7 [2024-12-30 00:30:44,187][04355] Worker 2 uses CPU cores [0] [2024-12-30 00:30:44,318][04357] Worker 3 uses CPU cores [1] [2024-12-30 00:30:44,354][04358] Worker 5 uses CPU cores [1] [2024-12-30 00:30:44,406][04339] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 00:30:44,408][04339] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-30 00:30:44,447][04339] Num visible devices: 1 [2024-12-30 00:30:44,472][04339] Starting seed is not provided [2024-12-30 00:30:44,473][04339] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 00:30:44,474][04339] Initializing actor-critic model on device cuda:0 [2024-12-30 00:30:44,475][04339] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 00:30:44,478][04339] RunningMeanStd input shape: (1,) [2024-12-30 00:30:44,503][04353] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 00:30:44,504][04353] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-30 00:30:44,528][04339] ConvEncoder: input_channels=3 [2024-12-30 00:30:44,555][04356] Worker 4 uses CPU cores [0] [2024-12-30 00:30:44,576][04353] Num visible devices: 1 [2024-12-30 00:30:44,649][04354] Worker 1 uses CPU cores [1] [2024-12-30 00:30:44,657][04360] Worker 7 uses CPU cores [1] [2024-12-30 00:30:44,694][04359] Worker 6 uses CPU cores [0] [2024-12-30 00:30:44,722][04352] Worker 0 uses CPU cores [0] [2024-12-30 00:30:44,824][04339] Conv encoder output size: 512 [2024-12-30 00:30:44,825][04339] Policy head output size: 512 [2024-12-30 00:30:44,878][04339] Created Actor Critic model with architecture: [2024-12-30 00:30:44,878][04339] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-30 00:30:45,200][04339] Using optimizer [2024-12-30 00:30:46,775][01374] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-30 00:30:46,777][01374] Heartbeat connected on Batcher_0 [2024-12-30 00:30:46,783][01374] Heartbeat connected on RolloutWorker_w0 [2024-12-30 00:30:46,790][01374] Heartbeat connected on RolloutWorker_w1 [2024-12-30 00:30:46,791][01374] Heartbeat connected on RolloutWorker_w2 [2024-12-30 00:30:46,795][01374] Heartbeat connected on RolloutWorker_w3 [2024-12-30 00:30:46,798][01374] Heartbeat connected on RolloutWorker_w4 [2024-12-30 00:30:46,806][01374] Heartbeat connected on RolloutWorker_w5 [2024-12-30 00:30:46,809][01374] Heartbeat connected on RolloutWorker_w6 [2024-12-30 00:30:46,813][01374] Heartbeat connected on RolloutWorker_w7 [2024-12-30 00:30:49,792][04339] No checkpoints found [2024-12-30 00:30:49,792][04339] Did not load from checkpoint, starting from scratch! [2024-12-30 00:30:49,793][04339] Initialized policy 0 weights for model version 0 [2024-12-30 00:30:49,796][04339] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 00:30:49,803][04339] LearnerWorker_p0 finished initialization! [2024-12-30 00:30:49,805][01374] Heartbeat connected on LearnerWorker_p0 [2024-12-30 00:30:49,894][04353] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 00:30:49,895][04353] RunningMeanStd input shape: (1,) [2024-12-30 00:30:49,907][04353] ConvEncoder: input_channels=3 [2024-12-30 00:30:49,971][01374] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 00:30:50,017][04353] Conv encoder output size: 512 [2024-12-30 00:30:50,018][04353] Policy head output size: 512 [2024-12-30 00:30:50,072][01374] Inference worker 0-0 is ready! [2024-12-30 00:30:50,074][01374] All inference workers are ready! Signal rollout workers to start! [2024-12-30 00:30:50,275][04352] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,275][04357] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,282][04354] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,280][04355] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,283][04360] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,281][04359] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,278][04358] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:50,284][04356] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:30:51,666][04356] Decorrelating experience for 0 frames... [2024-12-30 00:30:51,667][04355] Decorrelating experience for 0 frames... [2024-12-30 00:30:51,668][04352] Decorrelating experience for 0 frames... [2024-12-30 00:30:51,666][04354] Decorrelating experience for 0 frames... [2024-12-30 00:30:51,666][04357] Decorrelating experience for 0 frames... [2024-12-30 00:30:51,669][04358] Decorrelating experience for 0 frames... [2024-12-30 00:30:52,050][04357] Decorrelating experience for 32 frames... [2024-12-30 00:30:52,899][04358] Decorrelating experience for 32 frames... [2024-12-30 00:30:53,078][04357] Decorrelating experience for 64 frames... [2024-12-30 00:30:53,193][04355] Decorrelating experience for 32 frames... [2024-12-30 00:30:53,198][04352] Decorrelating experience for 32 frames... [2024-12-30 00:30:53,195][04356] Decorrelating experience for 32 frames... [2024-12-30 00:30:53,200][04359] Decorrelating experience for 0 frames... [2024-12-30 00:30:54,239][04359] Decorrelating experience for 32 frames... [2024-12-30 00:30:54,450][04355] Decorrelating experience for 64 frames... [2024-12-30 00:30:54,525][04358] Decorrelating experience for 64 frames... [2024-12-30 00:30:54,794][04357] Decorrelating experience for 96 frames... [2024-12-30 00:30:54,809][04360] Decorrelating experience for 0 frames... [2024-12-30 00:30:54,813][04354] Decorrelating experience for 32 frames... [2024-12-30 00:30:54,971][01374] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 00:30:55,655][04355] Decorrelating experience for 96 frames... [2024-12-30 00:30:55,694][04360] Decorrelating experience for 32 frames... [2024-12-30 00:30:55,914][04352] Decorrelating experience for 64 frames... [2024-12-30 00:30:56,394][04359] Decorrelating experience for 64 frames... [2024-12-30 00:30:56,442][04356] Decorrelating experience for 64 frames... [2024-12-30 00:30:56,521][04354] Decorrelating experience for 64 frames... [2024-12-30 00:30:57,862][04352] Decorrelating experience for 96 frames... [2024-12-30 00:30:58,012][04358] Decorrelating experience for 96 frames... [2024-12-30 00:30:58,165][04360] Decorrelating experience for 64 frames... [2024-12-30 00:30:58,482][04354] Decorrelating experience for 96 frames... [2024-12-30 00:30:58,540][04359] Decorrelating experience for 96 frames... [2024-12-30 00:30:59,971][01374] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 106.6. Samples: 1066. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 00:30:59,973][01374] Avg episode reward: [(0, '2.308')] [2024-12-30 00:31:03,795][04360] Decorrelating experience for 96 frames... [2024-12-30 00:31:04,041][04339] Signal inference workers to stop experience collection... [2024-12-30 00:31:04,061][04353] InferenceWorker_p0-w0: stopping experience collection [2024-12-30 00:31:04,974][01374] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 157.7. Samples: 2366. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 00:31:04,979][01374] Avg episode reward: [(0, '3.160')] [2024-12-30 00:31:05,060][04356] Decorrelating experience for 96 frames... [2024-12-30 00:31:06,370][04339] Signal inference workers to resume experience collection... [2024-12-30 00:31:06,371][04353] InferenceWorker_p0-w0: resuming experience collection [2024-12-30 00:31:09,971][01374] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 229.6. Samples: 4592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:31:09,978][01374] Avg episode reward: [(0, '3.693')] [2024-12-30 00:31:14,174][04353] Updated weights for policy 0, policy_version 10 (0.0149) [2024-12-30 00:31:14,971][01374] Fps is (10 sec: 4097.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 422.5. Samples: 10562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:31:14,977][01374] Avg episode reward: [(0, '4.147')] [2024-12-30 00:31:19,972][01374] Fps is (10 sec: 3276.5, 60 sec: 1911.4, 300 sec: 1911.4). Total num frames: 57344. Throughput: 0: 437.4. Samples: 13122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:31:19,974][01374] Avg episode reward: [(0, '4.349')] [2024-12-30 00:31:24,974][01374] Fps is (10 sec: 3275.7, 60 sec: 2106.3, 300 sec: 2106.3). Total num frames: 73728. Throughput: 0: 510.8. Samples: 17880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:31:24,978][01374] Avg episode reward: [(0, '4.421')] [2024-12-30 00:31:25,851][04353] Updated weights for policy 0, policy_version 20 (0.0024) [2024-12-30 00:31:29,971][01374] Fps is (10 sec: 4096.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 622.8. Samples: 24910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:31:29,976][01374] Avg episode reward: [(0, '4.416')] [2024-12-30 00:31:34,971][01374] Fps is (10 sec: 4507.2, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 631.9. Samples: 28436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:31:34,975][01374] Avg episode reward: [(0, '4.435')] [2024-12-30 00:31:34,981][04339] Saving new best policy, reward=4.435! [2024-12-30 00:31:36,246][04353] Updated weights for policy 0, policy_version 30 (0.0017) [2024-12-30 00:31:39,971][01374] Fps is (10 sec: 3276.8, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 131072. Throughput: 0: 724.7. Samples: 32610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:31:39,977][01374] Avg episode reward: [(0, '4.588')] [2024-12-30 00:31:40,039][04339] Saving new best policy, reward=4.588! [2024-12-30 00:31:44,971][01374] Fps is (10 sec: 3686.4, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 847.8. Samples: 39218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:31:44,975][01374] Avg episode reward: [(0, '4.572')] [2024-12-30 00:31:46,064][04353] Updated weights for policy 0, policy_version 40 (0.0023) [2024-12-30 00:31:49,971][01374] Fps is (10 sec: 4915.2, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 897.7. Samples: 42758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:31:49,973][01374] Avg episode reward: [(0, '4.494')] [2024-12-30 00:31:54,971][01374] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 967.3. Samples: 48122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:31:54,978][01374] Avg episode reward: [(0, '4.501')] [2024-12-30 00:31:57,290][04353] Updated weights for policy 0, policy_version 50 (0.0023) [2024-12-30 00:31:59,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 960.6. Samples: 53790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:31:59,973][01374] Avg episode reward: [(0, '4.554')] [2024-12-30 00:32:04,971][01374] Fps is (10 sec: 4505.8, 60 sec: 3959.7, 300 sec: 3167.6). Total num frames: 237568. Throughput: 0: 983.3. Samples: 57368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:32:04,978][01374] Avg episode reward: [(0, '4.436')] [2024-12-30 00:32:06,134][04353] Updated weights for policy 0, policy_version 60 (0.0024) [2024-12-30 00:32:09,972][01374] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3174.3). Total num frames: 253952. Throughput: 0: 1015.2. Samples: 63562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:32:09,976][01374] Avg episode reward: [(0, '4.314')] [2024-12-30 00:32:14,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3228.6). Total num frames: 274432. Throughput: 0: 959.7. Samples: 68098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:32:14,973][01374] Avg episode reward: [(0, '4.346')] [2024-12-30 00:32:17,674][04353] Updated weights for policy 0, policy_version 70 (0.0014) [2024-12-30 00:32:19,971][01374] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3276.8). Total num frames: 294912. Throughput: 0: 959.2. Samples: 71598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:32:19,976][01374] Avg episode reward: [(0, '4.214')] [2024-12-30 00:32:19,984][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth... [2024-12-30 00:32:24,971][01374] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 1023.1. Samples: 78648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:32:24,975][01374] Avg episode reward: [(0, '4.231')] [2024-12-30 00:32:28,101][04353] Updated weights for policy 0, policy_version 80 (0.0023) [2024-12-30 00:32:29,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 976.8. Samples: 83172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:32:29,977][01374] Avg episode reward: [(0, '4.408')] [2024-12-30 00:32:34,977][01374] Fps is (10 sec: 3683.9, 60 sec: 3890.8, 300 sec: 3354.6). Total num frames: 352256. Throughput: 0: 957.5. Samples: 85854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:32:34,982][01374] Avg episode reward: [(0, '4.562')] [2024-12-30 00:32:38,002][04353] Updated weights for policy 0, policy_version 90 (0.0021) [2024-12-30 00:32:39,971][01374] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3425.7). Total num frames: 376832. Throughput: 0: 992.6. Samples: 92788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:32:39,973][01374] Avg episode reward: [(0, '4.397')] [2024-12-30 00:32:44,971][01374] Fps is (10 sec: 4098.7, 60 sec: 3959.5, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 992.7. Samples: 98460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:32:44,973][01374] Avg episode reward: [(0, '4.311')] [2024-12-30 00:32:49,515][04353] Updated weights for policy 0, policy_version 100 (0.0013) [2024-12-30 00:32:49,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 959.5. Samples: 100546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:32:49,977][01374] Avg episode reward: [(0, '4.403')] [2024-12-30 00:32:54,971][01374] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 970.3. Samples: 107222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:32:54,977][01374] Avg episode reward: [(0, '4.431')] [2024-12-30 00:32:58,113][04353] Updated weights for policy 0, policy_version 110 (0.0017) [2024-12-30 00:32:59,971][01374] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 1018.0. Samples: 113906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:32:59,977][01374] Avg episode reward: [(0, '4.440')] [2024-12-30 00:33:04,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 988.5. Samples: 116082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:33:04,977][01374] Avg episode reward: [(0, '4.476')] [2024-12-30 00:33:09,705][04353] Updated weights for policy 0, policy_version 120 (0.0017) [2024-12-30 00:33:09,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 951.1. Samples: 121448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:33:09,972][01374] Avg episode reward: [(0, '4.497')] [2024-12-30 00:33:14,971][01374] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3559.3). Total num frames: 516096. Throughput: 0: 1010.0. Samples: 128624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:33:14,973][01374] Avg episode reward: [(0, '4.497')] [2024-12-30 00:33:19,972][01374] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3522.5). Total num frames: 528384. Throughput: 0: 1013.1. Samples: 131438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:33:19,977][01374] Avg episode reward: [(0, '4.482')] [2024-12-30 00:33:20,316][04353] Updated weights for policy 0, policy_version 130 (0.0017) [2024-12-30 00:33:24,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3541.1). Total num frames: 548864. Throughput: 0: 959.3. Samples: 135958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:33:24,973][01374] Avg episode reward: [(0, '4.589')] [2024-12-30 00:33:24,976][04339] Saving new best policy, reward=4.589! [2024-12-30 00:33:29,833][04353] Updated weights for policy 0, policy_version 140 (0.0013) [2024-12-30 00:33:29,971][01374] Fps is (10 sec: 4506.3, 60 sec: 4027.7, 300 sec: 3584.0). Total num frames: 573440. Throughput: 0: 987.6. Samples: 142902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:33:29,973][01374] Avg episode reward: [(0, '4.557')] [2024-12-30 00:33:34,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 1019.6. Samples: 146428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:33:34,973][01374] Avg episode reward: [(0, '4.498')] [2024-12-30 00:33:39,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3565.9). Total num frames: 606208. Throughput: 0: 971.5. Samples: 150940. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:33:39,975][01374] Avg episode reward: [(0, '4.451')] [2024-12-30 00:33:41,813][04353] Updated weights for policy 0, policy_version 150 (0.0030) [2024-12-30 00:33:44,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3581.1). Total num frames: 626688. Throughput: 0: 952.1. Samples: 156750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:33:44,977][01374] Avg episode reward: [(0, '4.342')] [2024-12-30 00:33:49,971][01374] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3595.4). Total num frames: 647168. Throughput: 0: 977.4. Samples: 160064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:33:49,974][01374] Avg episode reward: [(0, '4.548')] [2024-12-30 00:33:51,250][04353] Updated weights for policy 0, policy_version 160 (0.0020) [2024-12-30 00:33:54,973][01374] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3586.7). Total num frames: 663552. Throughput: 0: 979.4. Samples: 165524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:33:54,975][01374] Avg episode reward: [(0, '4.605')] [2024-12-30 00:33:54,977][04339] Saving new best policy, reward=4.605! [2024-12-30 00:33:59,973][01374] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3600.1). Total num frames: 684032. Throughput: 0: 934.5. Samples: 170680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:33:59,975][01374] Avg episode reward: [(0, '4.557')] [2024-12-30 00:34:02,510][04353] Updated weights for policy 0, policy_version 170 (0.0026) [2024-12-30 00:34:04,971][01374] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 951.8. Samples: 174268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:34:04,975][01374] Avg episode reward: [(0, '4.551')] [2024-12-30 00:34:09,972][01374] Fps is (10 sec: 4096.4, 60 sec: 3891.1, 300 sec: 3624.9). Total num frames: 724992. Throughput: 0: 995.7. Samples: 180764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:34:09,977][01374] Avg episode reward: [(0, '4.564')] [2024-12-30 00:34:13,572][04353] Updated weights for policy 0, policy_version 180 (0.0019) [2024-12-30 00:34:14,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3616.5). Total num frames: 741376. Throughput: 0: 936.8. Samples: 185056. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:34:14,978][01374] Avg episode reward: [(0, '4.447')] [2024-12-30 00:34:19,971][01374] Fps is (10 sec: 3686.9, 60 sec: 3891.3, 300 sec: 3627.9). Total num frames: 761856. Throughput: 0: 930.0. Samples: 188280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:34:19,973][01374] Avg episode reward: [(0, '4.699')] [2024-12-30 00:34:19,979][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2024-12-30 00:34:20,103][04339] Saving new best policy, reward=4.699! [2024-12-30 00:34:22,881][04353] Updated weights for policy 0, policy_version 190 (0.0024) [2024-12-30 00:34:24,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3657.8). Total num frames: 786432. Throughput: 0: 984.0. Samples: 195222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:34:24,973][01374] Avg episode reward: [(0, '4.786')] [2024-12-30 00:34:24,977][04339] Saving new best policy, reward=4.786! [2024-12-30 00:34:29,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 961.2. Samples: 200004. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:34:29,977][01374] Avg episode reward: [(0, '4.754')] [2024-12-30 00:34:34,515][04353] Updated weights for policy 0, policy_version 200 (0.0026) [2024-12-30 00:34:34,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3640.9). Total num frames: 819200. Throughput: 0: 941.3. Samples: 202424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:34:34,975][01374] Avg episode reward: [(0, '4.793')] [2024-12-30 00:34:34,979][04339] Saving new best policy, reward=4.793! [2024-12-30 00:34:39,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3668.6). Total num frames: 843776. Throughput: 0: 976.1. Samples: 209448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:34:39,976][01374] Avg episode reward: [(0, '4.613')] [2024-12-30 00:34:44,529][04353] Updated weights for policy 0, policy_version 210 (0.0022) [2024-12-30 00:34:44,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3660.3). Total num frames: 860160. Throughput: 0: 991.0. Samples: 215274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:34:44,975][01374] Avg episode reward: [(0, '4.703')] [2024-12-30 00:34:49,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3652.3). Total num frames: 876544. Throughput: 0: 960.0. Samples: 217466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:34:49,978][01374] Avg episode reward: [(0, '4.661')] [2024-12-30 00:34:54,809][04353] Updated weights for policy 0, policy_version 220 (0.0029) [2024-12-30 00:34:54,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3678.0). Total num frames: 901120. Throughput: 0: 957.4. Samples: 223844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:34:54,978][01374] Avg episode reward: [(0, '4.581')] [2024-12-30 00:34:59,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3686.4). Total num frames: 921600. Throughput: 0: 1018.3. Samples: 230880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:34:59,973][01374] Avg episode reward: [(0, '4.543')] [2024-12-30 00:35:04,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3662.3). Total num frames: 933888. Throughput: 0: 993.6. Samples: 232990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:35:04,973][01374] Avg episode reward: [(0, '4.510')] [2024-12-30 00:35:06,286][04353] Updated weights for policy 0, policy_version 230 (0.0022) [2024-12-30 00:35:09,971][01374] Fps is (10 sec: 3686.2, 60 sec: 3891.3, 300 sec: 3686.4). Total num frames: 958464. Throughput: 0: 958.0. Samples: 238334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 00:35:09,977][01374] Avg episode reward: [(0, '4.540')] [2024-12-30 00:35:14,956][04353] Updated weights for policy 0, policy_version 240 (0.0017) [2024-12-30 00:35:14,971][01374] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3709.6). Total num frames: 983040. Throughput: 0: 1007.5. Samples: 245342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:35:14,973][01374] Avg episode reward: [(0, '4.788')] [2024-12-30 00:35:19,971][01374] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 995328. Throughput: 0: 1019.7. Samples: 248310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:35:19,973][01374] Avg episode reward: [(0, '4.898')] [2024-12-30 00:35:19,985][04339] Saving new best policy, reward=4.898! [2024-12-30 00:35:24,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3693.8). Total num frames: 1015808. Throughput: 0: 961.6. Samples: 252722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:35:24,973][01374] Avg episode reward: [(0, '4.844')] [2024-12-30 00:35:26,348][04353] Updated weights for policy 0, policy_version 250 (0.0034) [2024-12-30 00:35:29,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3701.0). Total num frames: 1036288. Throughput: 0: 986.1. Samples: 259650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:35:29,977][01374] Avg episode reward: [(0, '5.051')] [2024-12-30 00:35:30,039][04339] Saving new best policy, reward=5.051! [2024-12-30 00:35:34,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3708.0). Total num frames: 1056768. Throughput: 0: 1014.1. Samples: 263100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:35:34,974][01374] Avg episode reward: [(0, '4.946')] [2024-12-30 00:35:36,675][04353] Updated weights for policy 0, policy_version 260 (0.0042) [2024-12-30 00:35:39,974][01374] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3700.5). Total num frames: 1073152. Throughput: 0: 976.2. Samples: 267774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:35:39,979][01374] Avg episode reward: [(0, '4.911')] [2024-12-30 00:35:44,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1093632. Throughput: 0: 951.6. Samples: 273702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:35:44,973][01374] Avg episode reward: [(0, '4.949')] [2024-12-30 00:35:47,236][04353] Updated weights for policy 0, policy_version 270 (0.0021) [2024-12-30 00:35:49,971][01374] Fps is (10 sec: 4506.1, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 977.0. Samples: 276954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:35:49,978][01374] Avg episode reward: [(0, '5.408')] [2024-12-30 00:35:49,989][04339] Saving new best policy, reward=5.408! [2024-12-30 00:35:54,971][01374] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 975.6. Samples: 282234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:35:54,978][01374] Avg episode reward: [(0, '5.681')] [2024-12-30 00:35:54,981][04339] Saving new best policy, reward=5.681! [2024-12-30 00:35:59,501][04353] Updated weights for policy 0, policy_version 280 (0.0036) [2024-12-30 00:35:59,971][01374] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3887.8). Total num frames: 1146880. Throughput: 0: 925.3. Samples: 286982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:35:59,973][01374] Avg episode reward: [(0, '5.781')] [2024-12-30 00:35:59,983][04339] Saving new best policy, reward=5.781! [2024-12-30 00:36:04,971][01374] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1171456. Throughput: 0: 933.9. Samples: 290336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:36:04,978][01374] Avg episode reward: [(0, '5.650')] [2024-12-30 00:36:09,483][04353] Updated weights for policy 0, policy_version 290 (0.0043) [2024-12-30 00:36:09,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 1187840. Throughput: 0: 975.1. Samples: 296602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:36:09,974][01374] Avg episode reward: [(0, '5.694')] [2024-12-30 00:36:14,973][01374] Fps is (10 sec: 2866.6, 60 sec: 3618.0, 300 sec: 3873.8). Total num frames: 1200128. Throughput: 0: 912.3. Samples: 300704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:36:14,975][01374] Avg episode reward: [(0, '5.932')] [2024-12-30 00:36:14,979][04339] Saving new best policy, reward=5.932! [2024-12-30 00:36:19,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.7). Total num frames: 1224704. Throughput: 0: 906.0. Samples: 303872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:36:19,973][01374] Avg episode reward: [(0, '6.166')] [2024-12-30 00:36:19,980][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth... [2024-12-30 00:36:20,104][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth [2024-12-30 00:36:20,127][04339] Saving new best policy, reward=6.166! [2024-12-30 00:36:20,583][04353] Updated weights for policy 0, policy_version 300 (0.0026) [2024-12-30 00:36:24,971][01374] Fps is (10 sec: 4506.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1245184. Throughput: 0: 949.8. Samples: 310514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:36:24,974][01374] Avg episode reward: [(0, '6.401')] [2024-12-30 00:36:24,981][04339] Saving new best policy, reward=6.401! [2024-12-30 00:36:29,971][01374] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1257472. Throughput: 0: 921.6. Samples: 315176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:36:29,975][01374] Avg episode reward: [(0, '6.494')] [2024-12-30 00:36:30,069][04339] Saving new best policy, reward=6.494! [2024-12-30 00:36:32,617][04353] Updated weights for policy 0, policy_version 310 (0.0038) [2024-12-30 00:36:34,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 1277952. Throughput: 0: 895.7. Samples: 317260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:36:34,972][01374] Avg episode reward: [(0, '6.459')] [2024-12-30 00:36:39,971][01374] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1298432. Throughput: 0: 930.1. Samples: 324086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:36:39,976][01374] Avg episode reward: [(0, '6.491')] [2024-12-30 00:36:41,967][04353] Updated weights for policy 0, policy_version 320 (0.0029) [2024-12-30 00:36:44,971][01374] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1314816. Throughput: 0: 949.9. Samples: 329728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:36:44,974][01374] Avg episode reward: [(0, '6.593')] [2024-12-30 00:36:44,992][04339] Saving new best policy, reward=6.593! [2024-12-30 00:36:49,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3860.0). Total num frames: 1331200. Throughput: 0: 920.3. Samples: 331748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:36:49,973][01374] Avg episode reward: [(0, '6.331')] [2024-12-30 00:36:53,982][04353] Updated weights for policy 0, policy_version 330 (0.0020) [2024-12-30 00:36:54,971][01374] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1355776. Throughput: 0: 911.0. Samples: 337596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:36:54,977][01374] Avg episode reward: [(0, '6.456')] [2024-12-30 00:36:59,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1376256. Throughput: 0: 969.1. Samples: 344310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-30 00:36:59,975][01374] Avg episode reward: [(0, '6.848')] [2024-12-30 00:36:59,981][04339] Saving new best policy, reward=6.848! [2024-12-30 00:37:04,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 1388544. Throughput: 0: 945.4. Samples: 346416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:37:04,977][01374] Avg episode reward: [(0, '7.099')] [2024-12-30 00:37:04,984][04339] Saving new best policy, reward=7.099! [2024-12-30 00:37:05,725][04353] Updated weights for policy 0, policy_version 340 (0.0023) [2024-12-30 00:37:09,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1409024. Throughput: 0: 901.0. Samples: 351060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:37:09,973][01374] Avg episode reward: [(0, '7.349')] [2024-12-30 00:37:09,981][04339] Saving new best policy, reward=7.349! [2024-12-30 00:37:14,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 1429504. Throughput: 0: 943.6. Samples: 357638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:37:14,978][01374] Avg episode reward: [(0, '7.773')] [2024-12-30 00:37:14,981][04339] Saving new best policy, reward=7.773! [2024-12-30 00:37:15,335][04353] Updated weights for policy 0, policy_version 350 (0.0020) [2024-12-30 00:37:19,973][01374] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1445888. Throughput: 0: 965.2. Samples: 360692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:37:19,980][01374] Avg episode reward: [(0, '7.921')] [2024-12-30 00:37:19,995][04339] Saving new best policy, reward=7.921! [2024-12-30 00:37:24,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 1462272. Throughput: 0: 901.4. Samples: 364650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:37:24,977][01374] Avg episode reward: [(0, '8.145')] [2024-12-30 00:37:24,981][04339] Saving new best policy, reward=8.145! [2024-12-30 00:37:27,481][04353] Updated weights for policy 0, policy_version 360 (0.0044) [2024-12-30 00:37:29,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.3). Total num frames: 1482752. Throughput: 0: 919.6. Samples: 371108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:37:29,973][01374] Avg episode reward: [(0, '8.378')] [2024-12-30 00:37:29,980][04339] Saving new best policy, reward=8.378! [2024-12-30 00:37:34,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1503232. Throughput: 0: 945.6. Samples: 374302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:37:34,973][01374] Avg episode reward: [(0, '9.557')] [2024-12-30 00:37:34,979][04339] Saving new best policy, reward=9.557! [2024-12-30 00:37:38,642][04353] Updated weights for policy 0, policy_version 370 (0.0022) [2024-12-30 00:37:39,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1515520. Throughput: 0: 923.2. Samples: 379140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:37:39,975][01374] Avg episode reward: [(0, '9.372')] [2024-12-30 00:37:44,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1536000. Throughput: 0: 893.4. Samples: 384512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:37:44,973][01374] Avg episode reward: [(0, '9.581')] [2024-12-30 00:37:45,057][04339] Saving new best policy, reward=9.581! [2024-12-30 00:37:48,776][04353] Updated weights for policy 0, policy_version 380 (0.0020) [2024-12-30 00:37:49,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1560576. Throughput: 0: 918.8. Samples: 387762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:37:49,974][01374] Avg episode reward: [(0, '10.192')] [2024-12-30 00:37:49,984][04339] Saving new best policy, reward=10.192! [2024-12-30 00:37:54,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 1572864. Throughput: 0: 942.3. Samples: 393462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:37:54,976][01374] Avg episode reward: [(0, '10.213')] [2024-12-30 00:37:55,043][04339] Saving new best policy, reward=10.213! [2024-12-30 00:37:59,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 1593344. Throughput: 0: 895.9. Samples: 397954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:37:59,980][01374] Avg episode reward: [(0, '10.607')] [2024-12-30 00:37:59,989][04339] Saving new best policy, reward=10.607! [2024-12-30 00:38:00,927][04353] Updated weights for policy 0, policy_version 390 (0.0034) [2024-12-30 00:38:04,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1613824. Throughput: 0: 903.7. Samples: 401360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:38:04,973][01374] Avg episode reward: [(0, '10.842')] [2024-12-30 00:38:04,978][04339] Saving new best policy, reward=10.842! [2024-12-30 00:38:09,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1634304. Throughput: 0: 969.4. Samples: 408272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:38:09,975][01374] Avg episode reward: [(0, '10.355')] [2024-12-30 00:38:10,884][04353] Updated weights for policy 0, policy_version 400 (0.0028) [2024-12-30 00:38:14,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 1646592. Throughput: 0: 916.9. Samples: 412368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:38:14,973][01374] Avg episode reward: [(0, '10.068')] [2024-12-30 00:38:19,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1667072. Throughput: 0: 906.5. Samples: 415094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:38:19,977][01374] Avg episode reward: [(0, '10.032')] [2024-12-30 00:38:20,039][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000408_1671168.pth... [2024-12-30 00:38:20,171][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2024-12-30 00:38:21,899][04353] Updated weights for policy 0, policy_version 410 (0.0014) [2024-12-30 00:38:24,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1691648. Throughput: 0: 948.6. Samples: 421828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:38:24,973][01374] Avg episode reward: [(0, '9.685')] [2024-12-30 00:38:29,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1708032. Throughput: 0: 949.5. Samples: 427240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:38:29,973][01374] Avg episode reward: [(0, '10.127')] [2024-12-30 00:38:33,597][04353] Updated weights for policy 0, policy_version 420 (0.0033) [2024-12-30 00:38:34,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1724416. Throughput: 0: 924.1. Samples: 429348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:38:34,974][01374] Avg episode reward: [(0, '10.994')] [2024-12-30 00:38:34,984][04339] Saving new best policy, reward=10.994! [2024-12-30 00:38:39,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1748992. Throughput: 0: 942.2. Samples: 435862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:38:39,978][01374] Avg episode reward: [(0, '12.001')] [2024-12-30 00:38:39,987][04339] Saving new best policy, reward=12.001! [2024-12-30 00:38:42,456][04353] Updated weights for policy 0, policy_version 430 (0.0027) [2024-12-30 00:38:44,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1765376. Throughput: 0: 982.8. Samples: 442182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:38:44,986][01374] Avg episode reward: [(0, '12.175')] [2024-12-30 00:38:44,991][04339] Saving new best policy, reward=12.175! [2024-12-30 00:38:49,975][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1781760. Throughput: 0: 949.1. Samples: 444070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:38:49,981][01374] Avg episode reward: [(0, '11.698')] [2024-12-30 00:38:54,272][04353] Updated weights for policy 0, policy_version 440 (0.0016) [2024-12-30 00:38:54,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 1802240. Throughput: 0: 919.9. Samples: 449668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:38:54,978][01374] Avg episode reward: [(0, '11.595')] [2024-12-30 00:38:59,971][01374] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1826816. Throughput: 0: 981.0. Samples: 456514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:38:59,979][01374] Avg episode reward: [(0, '11.922')] [2024-12-30 00:39:04,971][01374] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3776.7). Total num frames: 1839104. Throughput: 0: 976.6. Samples: 459042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:39:04,978][01374] Avg episode reward: [(0, '12.855')] [2024-12-30 00:39:04,983][04339] Saving new best policy, reward=12.855! [2024-12-30 00:39:05,440][04353] Updated weights for policy 0, policy_version 450 (0.0031) [2024-12-30 00:39:09,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1859584. Throughput: 0: 929.8. Samples: 463668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 00:39:09,977][01374] Avg episode reward: [(0, '13.496')] [2024-12-30 00:39:09,985][04339] Saving new best policy, reward=13.496! [2024-12-30 00:39:14,971][01374] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1880064. Throughput: 0: 962.2. Samples: 470538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:39:14,979][01374] Avg episode reward: [(0, '12.291')] [2024-12-30 00:39:15,191][04353] Updated weights for policy 0, policy_version 460 (0.0016) [2024-12-30 00:39:19,971][01374] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 1900544. Throughput: 0: 990.4. Samples: 473916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:39:19,973][01374] Avg episode reward: [(0, '12.127')] [2024-12-30 00:39:24,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1912832. Throughput: 0: 939.2. Samples: 478126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:39:24,973][01374] Avg episode reward: [(0, '12.008')] [2024-12-30 00:39:26,792][04353] Updated weights for policy 0, policy_version 470 (0.0015) [2024-12-30 00:39:29,971][01374] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1937408. Throughput: 0: 941.2. Samples: 484538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:39:29,973][01374] Avg episode reward: [(0, '12.780')] [2024-12-30 00:39:34,972][01374] Fps is (10 sec: 4914.4, 60 sec: 3959.4, 300 sec: 3790.5). Total num frames: 1961984. Throughput: 0: 974.9. Samples: 487940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:39:34,977][01374] Avg episode reward: [(0, '14.647')] [2024-12-30 00:39:34,987][04339] Saving new best policy, reward=14.647! [2024-12-30 00:39:36,554][04353] Updated weights for policy 0, policy_version 480 (0.0013) [2024-12-30 00:39:39,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1974272. Throughput: 0: 964.9. Samples: 493088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:39:39,977][01374] Avg episode reward: [(0, '15.795')] [2024-12-30 00:39:39,986][04339] Saving new best policy, reward=15.795! [2024-12-30 00:39:44,971][01374] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1994752. Throughput: 0: 935.3. Samples: 498602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:39:44,977][01374] Avg episode reward: [(0, '16.681')] [2024-12-30 00:39:44,983][04339] Saving new best policy, reward=16.681! [2024-12-30 00:39:47,569][04353] Updated weights for policy 0, policy_version 490 (0.0020) [2024-12-30 00:39:49,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 2015232. Throughput: 0: 950.7. Samples: 501822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:39:49,975][01374] Avg episode reward: [(0, '15.784')] [2024-12-30 00:39:54,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2031616. Throughput: 0: 985.5. Samples: 508016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:39:54,976][01374] Avg episode reward: [(0, '14.530')] [2024-12-30 00:39:59,067][04353] Updated weights for policy 0, policy_version 500 (0.0031) [2024-12-30 00:39:59,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2048000. Throughput: 0: 930.8. Samples: 512424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:39:59,974][01374] Avg episode reward: [(0, '13.711')] [2024-12-30 00:40:04,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2072576. Throughput: 0: 935.8. Samples: 516026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:40:04,975][01374] Avg episode reward: [(0, '13.718')] [2024-12-30 00:40:07,872][04353] Updated weights for policy 0, policy_version 510 (0.0021) [2024-12-30 00:40:09,973][01374] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3762.7). Total num frames: 2093056. Throughput: 0: 996.0. Samples: 522948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:40:09,979][01374] Avg episode reward: [(0, '15.017')] [2024-12-30 00:40:14,971][01374] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2109440. Throughput: 0: 951.8. Samples: 527368. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:40:14,973][01374] Avg episode reward: [(0, '15.770')] [2024-12-30 00:40:19,679][04353] Updated weights for policy 0, policy_version 520 (0.0021) [2024-12-30 00:40:19,971][01374] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2129920. Throughput: 0: 932.4. Samples: 529896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:40:19,978][01374] Avg episode reward: [(0, '15.822')] [2024-12-30 00:40:19,987][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth... [2024-12-30 00:40:20,111][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth [2024-12-30 00:40:24,971][01374] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2150400. Throughput: 0: 974.4. Samples: 536936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 00:40:24,978][01374] Avg episode reward: [(0, '15.102')] [2024-12-30 00:40:29,972][01374] Fps is (10 sec: 3685.9, 60 sec: 3822.8, 300 sec: 3762.7). Total num frames: 2166784. Throughput: 0: 975.1. Samples: 542482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:40:29,980][01374] Avg episode reward: [(0, '15.398')] [2024-12-30 00:40:30,260][04353] Updated weights for policy 0, policy_version 530 (0.0031) [2024-12-30 00:40:34,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2187264. Throughput: 0: 950.4. Samples: 544588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:40:34,978][01374] Avg episode reward: [(0, '16.706')] [2024-12-30 00:40:34,982][04339] Saving new best policy, reward=16.706! [2024-12-30 00:40:39,971][01374] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2207744. Throughput: 0: 953.9. Samples: 550940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:40:39,973][01374] Avg episode reward: [(0, '16.987')] [2024-12-30 00:40:39,982][04339] Saving new best policy, reward=16.987! [2024-12-30 00:40:40,406][04353] Updated weights for policy 0, policy_version 540 (0.0040) [2024-12-30 00:40:44,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2228224. Throughput: 0: 1002.0. Samples: 557516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:40:44,975][01374] Avg episode reward: [(0, '17.831')] [2024-12-30 00:40:44,977][04339] Saving new best policy, reward=17.831! [2024-12-30 00:40:49,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2240512. Throughput: 0: 965.2. Samples: 559460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:40:49,974][01374] Avg episode reward: [(0, '18.775')] [2024-12-30 00:40:49,987][04339] Saving new best policy, reward=18.775! [2024-12-30 00:40:52,283][04353] Updated weights for policy 0, policy_version 550 (0.0039) [2024-12-30 00:40:54,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2265088. Throughput: 0: 928.6. Samples: 564734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:40:54,973][01374] Avg episode reward: [(0, '18.375')] [2024-12-30 00:40:59,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2285568. Throughput: 0: 987.7. Samples: 571814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:40:59,977][01374] Avg episode reward: [(0, '18.529')] [2024-12-30 00:41:00,968][04353] Updated weights for policy 0, policy_version 560 (0.0017) [2024-12-30 00:41:04,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2301952. Throughput: 0: 993.7. Samples: 574614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:41:04,976][01374] Avg episode reward: [(0, '18.133')] [2024-12-30 00:41:09,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 2322432. Throughput: 0: 936.0. Samples: 579056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:41:09,974][01374] Avg episode reward: [(0, '18.732')] [2024-12-30 00:41:12,600][04353] Updated weights for policy 0, policy_version 570 (0.0038) [2024-12-30 00:41:14,971][01374] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2342912. Throughput: 0: 966.0. Samples: 585952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:41:14,979][01374] Avg episode reward: [(0, '17.628')] [2024-12-30 00:41:19,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2363392. Throughput: 0: 994.5. Samples: 589342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:41:19,973][01374] Avg episode reward: [(0, '17.097')] [2024-12-30 00:41:23,995][04353] Updated weights for policy 0, policy_version 580 (0.0025) [2024-12-30 00:41:24,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2375680. Throughput: 0: 952.8. Samples: 593814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:41:24,973][01374] Avg episode reward: [(0, '17.920')] [2024-12-30 00:41:29,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 2400256. Throughput: 0: 944.0. Samples: 599998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:41:29,973][01374] Avg episode reward: [(0, '17.769')] [2024-12-30 00:41:33,225][04353] Updated weights for policy 0, policy_version 590 (0.0036) [2024-12-30 00:41:34,971][01374] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2424832. Throughput: 0: 977.3. Samples: 603438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:41:34,978][01374] Avg episode reward: [(0, '19.016')] [2024-12-30 00:41:34,979][04339] Saving new best policy, reward=19.016! [2024-12-30 00:41:39,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2437120. Throughput: 0: 979.6. Samples: 608814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:41:39,976][01374] Avg episode reward: [(0, '19.972')] [2024-12-30 00:41:39,984][04339] Saving new best policy, reward=19.972! [2024-12-30 00:41:44,971][01374] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2453504. Throughput: 0: 934.9. Samples: 613884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:41:44,977][01374] Avg episode reward: [(0, '19.087')] [2024-12-30 00:41:44,998][04353] Updated weights for policy 0, policy_version 600 (0.0019) [2024-12-30 00:41:49,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2478080. Throughput: 0: 945.6. Samples: 617164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:41:49,973][01374] Avg episode reward: [(0, '19.042')] [2024-12-30 00:41:54,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2494464. Throughput: 0: 989.7. Samples: 623594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:41:54,975][01374] Avg episode reward: [(0, '18.071')] [2024-12-30 00:41:55,409][04353] Updated weights for policy 0, policy_version 610 (0.0023) [2024-12-30 00:41:59,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2510848. Throughput: 0: 931.2. Samples: 627856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:41:59,977][01374] Avg episode reward: [(0, '17.760')] [2024-12-30 00:42:04,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2531328. Throughput: 0: 928.1. Samples: 631106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 00:42:04,974][01374] Avg episode reward: [(0, '18.266')] [2024-12-30 00:42:05,965][04353] Updated weights for policy 0, policy_version 620 (0.0016) [2024-12-30 00:42:09,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2555904. Throughput: 0: 983.5. Samples: 638072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:42:09,972][01374] Avg episode reward: [(0, '18.571')] [2024-12-30 00:42:14,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2568192. Throughput: 0: 949.9. Samples: 642744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:42:14,976][01374] Avg episode reward: [(0, '19.084')] [2024-12-30 00:42:17,730][04353] Updated weights for policy 0, policy_version 630 (0.0030) [2024-12-30 00:42:19,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2588672. Throughput: 0: 921.4. Samples: 644900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:42:19,978][01374] Avg episode reward: [(0, '18.797')] [2024-12-30 00:42:19,987][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000632_2588672.pth... [2024-12-30 00:42:20,116][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000408_1671168.pth [2024-12-30 00:42:24,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2613248. Throughput: 0: 957.4. Samples: 651898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-30 00:42:24,976][01374] Avg episode reward: [(0, '19.145')] [2024-12-30 00:42:26,737][04353] Updated weights for policy 0, policy_version 640 (0.0017) [2024-12-30 00:42:29,973][01374] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 2629632. Throughput: 0: 976.0. Samples: 657806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-30 00:42:29,975][01374] Avg episode reward: [(0, '18.466')] [2024-12-30 00:42:34,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2646016. Throughput: 0: 949.7. Samples: 659900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:42:34,978][01374] Avg episode reward: [(0, '18.692')] [2024-12-30 00:42:38,409][04353] Updated weights for policy 0, policy_version 650 (0.0013) [2024-12-30 00:42:39,971][01374] Fps is (10 sec: 3687.1, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2666496. Throughput: 0: 941.1. Samples: 665942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 00:42:39,979][01374] Avg episode reward: [(0, '18.308')] [2024-12-30 00:42:44,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2691072. Throughput: 0: 999.8. Samples: 672846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:42:44,973][01374] Avg episode reward: [(0, '17.774')] [2024-12-30 00:42:49,223][04353] Updated weights for policy 0, policy_version 660 (0.0026) [2024-12-30 00:42:49,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2703360. Throughput: 0: 974.4. Samples: 674956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:42:49,973][01374] Avg episode reward: [(0, '18.321')] [2024-12-30 00:42:54,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2723840. Throughput: 0: 930.0. Samples: 679920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:42:54,980][01374] Avg episode reward: [(0, '19.787')] [2024-12-30 00:42:58,773][04353] Updated weights for policy 0, policy_version 670 (0.0033) [2024-12-30 00:42:59,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2748416. Throughput: 0: 983.7. Samples: 687012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:42:59,978][01374] Avg episode reward: [(0, '19.589')] [2024-12-30 00:43:04,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2764800. Throughput: 0: 1004.1. Samples: 690084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:43:04,974][01374] Avg episode reward: [(0, '19.075')] [2024-12-30 00:43:09,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2781184. Throughput: 0: 940.2. Samples: 694208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:43:09,973][01374] Avg episode reward: [(0, '19.500')] [2024-12-30 00:43:10,558][04353] Updated weights for policy 0, policy_version 680 (0.0024) [2024-12-30 00:43:14,971][01374] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2805760. Throughput: 0: 960.5. Samples: 701028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:43:14,974][01374] Avg episode reward: [(0, '20.007')] [2024-12-30 00:43:14,976][04339] Saving new best policy, reward=20.007! [2024-12-30 00:43:19,973][01374] Fps is (10 sec: 4095.1, 60 sec: 3891.0, 300 sec: 3832.2). Total num frames: 2822144. Throughput: 0: 989.8. Samples: 704444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:43:19,975][01374] Avg episode reward: [(0, '19.936')] [2024-12-30 00:43:20,014][04353] Updated weights for policy 0, policy_version 690 (0.0029) [2024-12-30 00:43:24,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2838528. Throughput: 0: 964.0. Samples: 709322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:43:24,975][01374] Avg episode reward: [(0, '20.313')] [2024-12-30 00:43:24,983][04339] Saving new best policy, reward=20.313! [2024-12-30 00:43:29,971][01374] Fps is (10 sec: 3687.3, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 2859008. Throughput: 0: 939.7. Samples: 715132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:43:29,973][01374] Avg episode reward: [(0, '20.201')] [2024-12-30 00:43:30,988][04353] Updated weights for policy 0, policy_version 700 (0.0018) [2024-12-30 00:43:34,971][01374] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2883584. Throughput: 0: 972.4. Samples: 718716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:43:34,976][01374] Avg episode reward: [(0, '19.982')] [2024-12-30 00:43:39,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2899968. Throughput: 0: 997.6. Samples: 724810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:43:39,977][01374] Avg episode reward: [(0, '19.523')] [2024-12-30 00:43:42,031][04353] Updated weights for policy 0, policy_version 710 (0.0025) [2024-12-30 00:43:44,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2920448. Throughput: 0: 947.5. Samples: 729648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:43:44,977][01374] Avg episode reward: [(0, '18.304')] [2024-12-30 00:43:49,971][01374] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2945024. Throughput: 0: 961.6. Samples: 733354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:43:49,973][01374] Avg episode reward: [(0, '18.350')] [2024-12-30 00:43:50,990][04353] Updated weights for policy 0, policy_version 720 (0.0020) [2024-12-30 00:43:54,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2961408. Throughput: 0: 1024.4. Samples: 740304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:43:54,979][01374] Avg episode reward: [(0, '18.226')] [2024-12-30 00:43:59,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2977792. Throughput: 0: 970.6. Samples: 744706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:43:59,973][01374] Avg episode reward: [(0, '18.469')] [2024-12-30 00:44:02,261][04353] Updated weights for policy 0, policy_version 730 (0.0023) [2024-12-30 00:44:04,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3002368. Throughput: 0: 966.1. Samples: 747918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:44:04,976][01374] Avg episode reward: [(0, '20.368')] [2024-12-30 00:44:04,978][04339] Saving new best policy, reward=20.368! [2024-12-30 00:44:09,971][01374] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3026944. Throughput: 0: 1014.4. Samples: 754972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:44:09,976][01374] Avg episode reward: [(0, '19.764')] [2024-12-30 00:44:11,164][04353] Updated weights for policy 0, policy_version 740 (0.0020) [2024-12-30 00:44:14,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3039232. Throughput: 0: 1005.3. Samples: 760370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:44:14,977][01374] Avg episode reward: [(0, '20.342')] [2024-12-30 00:44:19,973][01374] Fps is (10 sec: 3276.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3059712. Throughput: 0: 973.8. Samples: 762538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:44:19,982][01374] Avg episode reward: [(0, '20.636')] [2024-12-30 00:44:19,996][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth... [2024-12-30 00:44:20,116][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth [2024-12-30 00:44:20,135][04339] Saving new best policy, reward=20.636! [2024-12-30 00:44:22,292][04353] Updated weights for policy 0, policy_version 750 (0.0017) [2024-12-30 00:44:24,971][01374] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3084288. Throughput: 0: 991.9. Samples: 769446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:44:24,976][01374] Avg episode reward: [(0, '21.832')] [2024-12-30 00:44:24,978][04339] Saving new best policy, reward=21.832! [2024-12-30 00:44:29,971][01374] Fps is (10 sec: 4097.0, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3100672. Throughput: 0: 1022.0. Samples: 775640. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:44:29,973][01374] Avg episode reward: [(0, '21.222')] [2024-12-30 00:44:33,223][04353] Updated weights for policy 0, policy_version 760 (0.0023) [2024-12-30 00:44:34,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3117056. Throughput: 0: 987.4. Samples: 777786. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:44:34,973][01374] Avg episode reward: [(0, '21.622')] [2024-12-30 00:44:39,971][01374] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3141632. Throughput: 0: 967.9. Samples: 783858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:44:39,978][01374] Avg episode reward: [(0, '21.886')] [2024-12-30 00:44:39,987][04339] Saving new best policy, reward=21.886! [2024-12-30 00:44:42,548][04353] Updated weights for policy 0, policy_version 770 (0.0017) [2024-12-30 00:44:44,971][01374] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3162112. Throughput: 0: 1025.5. Samples: 790854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:44:44,975][01374] Avg episode reward: [(0, '20.851')] [2024-12-30 00:44:49,971][01374] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3178496. Throughput: 0: 1004.1. Samples: 793102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:44:49,974][01374] Avg episode reward: [(0, '20.357')] [2024-12-30 00:44:54,214][04353] Updated weights for policy 0, policy_version 780 (0.0013) [2024-12-30 00:44:54,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3194880. Throughput: 0: 957.0. Samples: 798036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:44:54,974][01374] Avg episode reward: [(0, '19.227')] [2024-12-30 00:44:59,971][01374] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3219456. Throughput: 0: 994.9. Samples: 805142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:44:59,979][01374] Avg episode reward: [(0, '20.328')] [2024-12-30 00:45:03,676][04353] Updated weights for policy 0, policy_version 790 (0.0016) [2024-12-30 00:45:04,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3235840. Throughput: 0: 1020.3. Samples: 808450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:45:04,976][01374] Avg episode reward: [(0, '20.195')] [2024-12-30 00:45:09,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 3252224. Throughput: 0: 961.1. Samples: 812694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:45:09,973][01374] Avg episode reward: [(0, '19.997')] [2024-12-30 00:45:14,252][04353] Updated weights for policy 0, policy_version 800 (0.0017) [2024-12-30 00:45:14,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3276800. Throughput: 0: 976.9. Samples: 819602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:45:14,973][01374] Avg episode reward: [(0, '20.100')] [2024-12-30 00:45:19,971][01374] Fps is (10 sec: 4915.1, 60 sec: 4027.9, 300 sec: 3901.6). Total num frames: 3301376. Throughput: 0: 1008.1. Samples: 823150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:45:19,975][01374] Avg episode reward: [(0, '21.967')] [2024-12-30 00:45:19,994][04339] Saving new best policy, reward=21.967! [2024-12-30 00:45:24,971][01374] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3313664. Throughput: 0: 980.8. Samples: 827994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:45:24,978][01374] Avg episode reward: [(0, '21.296')] [2024-12-30 00:45:25,904][04353] Updated weights for policy 0, policy_version 810 (0.0026) [2024-12-30 00:45:29,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3334144. Throughput: 0: 950.2. Samples: 833614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:45:29,973][01374] Avg episode reward: [(0, '21.775')] [2024-12-30 00:45:34,722][04353] Updated weights for policy 0, policy_version 820 (0.0024) [2024-12-30 00:45:34,971][01374] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3358720. Throughput: 0: 978.5. Samples: 837132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:45:34,978][01374] Avg episode reward: [(0, '22.188')] [2024-12-30 00:45:34,983][04339] Saving new best policy, reward=22.188! [2024-12-30 00:45:39,971][01374] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3375104. Throughput: 0: 1006.1. Samples: 843312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:45:39,977][01374] Avg episode reward: [(0, '21.800')] [2024-12-30 00:45:44,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3391488. Throughput: 0: 950.1. Samples: 847898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:45:44,976][01374] Avg episode reward: [(0, '20.845')] [2024-12-30 00:45:46,281][04353] Updated weights for policy 0, policy_version 830 (0.0026) [2024-12-30 00:45:49,971][01374] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3416064. Throughput: 0: 956.8. Samples: 851506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:45:49,973][01374] Avg episode reward: [(0, '21.646')] [2024-12-30 00:45:54,971][01374] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 3432448. Throughput: 0: 1016.0. Samples: 858414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 00:45:54,973][01374] Avg episode reward: [(0, '20.597')] [2024-12-30 00:45:56,453][04353] Updated weights for policy 0, policy_version 840 (0.0036) [2024-12-30 00:45:59,976][01374] Fps is (10 sec: 3275.2, 60 sec: 3822.6, 300 sec: 3887.7). Total num frames: 3448832. Throughput: 0: 959.0. Samples: 862760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-30 00:45:59,983][01374] Avg episode reward: [(0, '21.866')] [2024-12-30 00:46:04,971][01374] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3473408. Throughput: 0: 944.9. Samples: 865672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:46:04,979][01374] Avg episode reward: [(0, '21.311')] [2024-12-30 00:46:06,678][04353] Updated weights for policy 0, policy_version 850 (0.0038) [2024-12-30 00:46:09,971][01374] Fps is (10 sec: 4917.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3497984. Throughput: 0: 993.0. Samples: 872680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:46:09,973][01374] Avg episode reward: [(0, '20.496')] [2024-12-30 00:46:14,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3510272. Throughput: 0: 988.8. Samples: 878112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:46:14,977][01374] Avg episode reward: [(0, '19.889')] [2024-12-30 00:46:18,030][04353] Updated weights for policy 0, policy_version 860 (0.0023) [2024-12-30 00:46:19,971][01374] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3530752. Throughput: 0: 958.2. Samples: 880250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:46:19,978][01374] Avg episode reward: [(0, '20.450')] [2024-12-30 00:46:19,989][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth... [2024-12-30 00:46:20,105][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000632_2588672.pth [2024-12-30 00:46:24,971][01374] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3551232. Throughput: 0: 963.6. Samples: 886674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:46:24,978][01374] Avg episode reward: [(0, '20.892')] [2024-12-30 00:46:27,062][04353] Updated weights for policy 0, policy_version 870 (0.0030) [2024-12-30 00:46:29,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3571712. Throughput: 0: 1006.3. Samples: 893182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:46:29,977][01374] Avg episode reward: [(0, '21.034')] [2024-12-30 00:46:34,971][01374] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 3584000. Throughput: 0: 973.2. Samples: 895300. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 00:46:34,981][01374] Avg episode reward: [(0, '21.482')] [2024-12-30 00:46:38,578][04353] Updated weights for policy 0, policy_version 880 (0.0019) [2024-12-30 00:46:39,971][01374] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3608576. Throughput: 0: 945.9. Samples: 900978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:46:39,976][01374] Avg episode reward: [(0, '22.475')] [2024-12-30 00:46:39,984][04339] Saving new best policy, reward=22.475! [2024-12-30 00:46:44,971][01374] Fps is (10 sec: 4915.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3633152. Throughput: 0: 1005.5. Samples: 908002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:46:44,974][01374] Avg episode reward: [(0, '24.166')] [2024-12-30 00:46:44,979][04339] Saving new best policy, reward=24.166! [2024-12-30 00:46:48,788][04353] Updated weights for policy 0, policy_version 890 (0.0032) [2024-12-30 00:46:49,971][01374] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3645440. Throughput: 0: 998.0. Samples: 910582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:46:49,975][01374] Avg episode reward: [(0, '22.524')] [2024-12-30 00:46:54,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3665920. Throughput: 0: 944.5. Samples: 915180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:46:54,977][01374] Avg episode reward: [(0, '23.310')] [2024-12-30 00:46:59,180][04353] Updated weights for policy 0, policy_version 900 (0.0016) [2024-12-30 00:46:59,971][01374] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3929.4). Total num frames: 3690496. Throughput: 0: 980.4. Samples: 922228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:46:59,979][01374] Avg episode reward: [(0, '21.472')] [2024-12-30 00:47:04,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3706880. Throughput: 0: 1011.4. Samples: 925764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:04,979][01374] Avg episode reward: [(0, '19.993')] [2024-12-30 00:47:09,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3723264. Throughput: 0: 967.4. Samples: 930206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 00:47:09,977][01374] Avg episode reward: [(0, '19.885')] [2024-12-30 00:47:10,641][04353] Updated weights for policy 0, policy_version 910 (0.0025) [2024-12-30 00:47:14,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3747840. Throughput: 0: 962.0. Samples: 936474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:14,979][01374] Avg episode reward: [(0, '19.256')] [2024-12-30 00:47:19,284][04353] Updated weights for policy 0, policy_version 920 (0.0028) [2024-12-30 00:47:19,974][01374] Fps is (10 sec: 4504.0, 60 sec: 3959.2, 300 sec: 3915.5). Total num frames: 3768320. Throughput: 0: 993.3. Samples: 940000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 00:47:19,977][01374] Avg episode reward: [(0, '18.782')] [2024-12-30 00:47:24,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3784704. Throughput: 0: 990.2. Samples: 945538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:24,977][01374] Avg episode reward: [(0, '20.687')] [2024-12-30 00:47:29,972][01374] Fps is (10 sec: 3278.0, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3801088. Throughput: 0: 946.0. Samples: 950570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:29,978][01374] Avg episode reward: [(0, '20.840')] [2024-12-30 00:47:31,148][04353] Updated weights for policy 0, policy_version 930 (0.0031) [2024-12-30 00:47:34,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3821568. Throughput: 0: 961.2. Samples: 953836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:34,978][01374] Avg episode reward: [(0, '20.821')] [2024-12-30 00:47:39,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3837952. Throughput: 0: 990.4. Samples: 959746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:47:39,975][01374] Avg episode reward: [(0, '21.939')] [2024-12-30 00:47:43,064][04353] Updated weights for policy 0, policy_version 940 (0.0027) [2024-12-30 00:47:44,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 3854336. Throughput: 0: 924.0. Samples: 963808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:47:44,973][01374] Avg episode reward: [(0, '22.192')] [2024-12-30 00:47:49,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3874816. Throughput: 0: 915.5. Samples: 966960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:49,978][01374] Avg episode reward: [(0, '22.404')] [2024-12-30 00:47:53,182][04353] Updated weights for policy 0, policy_version 950 (0.0019) [2024-12-30 00:47:54,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3895296. Throughput: 0: 957.2. Samples: 973282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 00:47:54,979][01374] Avg episode reward: [(0, '21.745')] [2024-12-30 00:47:59,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 3907584. Throughput: 0: 917.1. Samples: 977742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:47:59,981][01374] Avg episode reward: [(0, '21.063')] [2024-12-30 00:48:04,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 3928064. Throughput: 0: 886.4. Samples: 979884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:48:04,975][01374] Avg episode reward: [(0, '21.035')] [2024-12-30 00:48:05,431][04353] Updated weights for policy 0, policy_version 960 (0.0030) [2024-12-30 00:48:09,971][01374] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3948544. Throughput: 0: 909.1. Samples: 986448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 00:48:09,973][01374] Avg episode reward: [(0, '20.948')] [2024-12-30 00:48:14,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3873.9). Total num frames: 3964928. Throughput: 0: 918.1. Samples: 991884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 00:48:14,973][01374] Avg episode reward: [(0, '21.206')] [2024-12-30 00:48:17,067][04353] Updated weights for policy 0, policy_version 970 (0.0032) [2024-12-30 00:48:19,971][01374] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3873.8). Total num frames: 3981312. Throughput: 0: 888.0. Samples: 993796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:48:19,975][01374] Avg episode reward: [(0, '20.985')] [2024-12-30 00:48:19,987][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000972_3981312.pth... [2024-12-30 00:48:20,112][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth [2024-12-30 00:48:24,971][01374] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 4001792. Throughput: 0: 880.8. Samples: 999384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 00:48:24,974][01374] Avg episode reward: [(0, '20.437')] [2024-12-30 00:48:25,609][04339] Stopping Batcher_0... [2024-12-30 00:48:25,609][04339] Loop batcher_evt_loop terminating... [2024-12-30 00:48:25,610][01374] Component Batcher_0 stopped! [2024-12-30 00:48:25,617][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 00:48:25,663][04353] Weights refcount: 2 0 [2024-12-30 00:48:25,666][04353] Stopping InferenceWorker_p0-w0... [2024-12-30 00:48:25,667][04353] Loop inference_proc0-0_evt_loop terminating... [2024-12-30 00:48:25,666][01374] Component InferenceWorker_p0-w0 stopped! [2024-12-30 00:48:25,749][04339] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth [2024-12-30 00:48:25,761][04339] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 00:48:25,979][04339] Stopping LearnerWorker_p0... [2024-12-30 00:48:25,986][04339] Loop learner_proc0_evt_loop terminating... [2024-12-30 00:48:25,981][01374] Component LearnerWorker_p0 stopped! [2024-12-30 00:48:26,071][01374] Component RolloutWorker_w3 stopped! [2024-12-30 00:48:26,075][04357] Stopping RolloutWorker_w3... [2024-12-30 00:48:26,076][04357] Loop rollout_proc3_evt_loop terminating... [2024-12-30 00:48:26,078][01374] Component RolloutWorker_w7 stopped! [2024-12-30 00:48:26,083][04360] Stopping RolloutWorker_w7... [2024-12-30 00:48:26,084][04360] Loop rollout_proc7_evt_loop terminating... [2024-12-30 00:48:26,098][04352] Stopping RolloutWorker_w0... [2024-12-30 00:48:26,098][04352] Loop rollout_proc0_evt_loop terminating... [2024-12-30 00:48:26,095][01374] Component RolloutWorker_w1 stopped! [2024-12-30 00:48:26,102][01374] Component RolloutWorker_w0 stopped! [2024-12-30 00:48:26,109][04354] Stopping RolloutWorker_w1... [2024-12-30 00:48:26,110][04354] Loop rollout_proc1_evt_loop terminating... [2024-12-30 00:48:26,112][04356] Stopping RolloutWorker_w4... [2024-12-30 00:48:26,112][01374] Component RolloutWorker_w4 stopped! [2024-12-30 00:48:26,112][04356] Loop rollout_proc4_evt_loop terminating... [2024-12-30 00:48:26,129][04355] Stopping RolloutWorker_w2... [2024-12-30 00:48:26,129][01374] Component RolloutWorker_w2 stopped! [2024-12-30 00:48:26,140][04355] Loop rollout_proc2_evt_loop terminating... [2024-12-30 00:48:26,145][04358] Stopping RolloutWorker_w5... [2024-12-30 00:48:26,145][01374] Component RolloutWorker_w5 stopped! [2024-12-30 00:48:26,146][04358] Loop rollout_proc5_evt_loop terminating... [2024-12-30 00:48:26,178][01374] Component RolloutWorker_w6 stopped! [2024-12-30 00:48:26,180][01374] Waiting for process learner_proc0 to stop... [2024-12-30 00:48:26,182][04359] Stopping RolloutWorker_w6... [2024-12-30 00:48:26,182][04359] Loop rollout_proc6_evt_loop terminating... [2024-12-30 00:48:27,735][01374] Waiting for process inference_proc0-0 to join... [2024-12-30 00:48:27,739][01374] Waiting for process rollout_proc0 to join... [2024-12-30 00:48:30,074][01374] Waiting for process rollout_proc1 to join... [2024-12-30 00:48:30,081][01374] Waiting for process rollout_proc2 to join... [2024-12-30 00:48:30,086][01374] Waiting for process rollout_proc3 to join... [2024-12-30 00:48:30,092][01374] Waiting for process rollout_proc4 to join... [2024-12-30 00:48:30,095][01374] Waiting for process rollout_proc5 to join... [2024-12-30 00:48:30,100][01374] Waiting for process rollout_proc6 to join... [2024-12-30 00:48:30,105][01374] Waiting for process rollout_proc7 to join... [2024-12-30 00:48:30,108][01374] Batcher 0 profile tree view: batching: 27.3255, releasing_batches: 0.0296 [2024-12-30 00:48:30,111][01374] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 423.7625 update_model: 8.6639 weight_update: 0.0033 one_step: 0.0137 handle_policy_step: 577.7178 deserialize: 14.8934, stack: 3.2117, obs_to_device_normalize: 122.6845, forward: 289.8778, send_messages: 28.7130 prepare_outputs: 88.5375 to_cpu: 52.6913 [2024-12-30 00:48:30,116][01374] Learner 0 profile tree view: misc: 0.0059, prepare_batch: 13.6702 train: 74.1508 epoch_init: 0.0228, minibatch_init: 0.0081, losses_postprocess: 0.6333, kl_divergence: 0.6432, after_optimizer: 33.0604 calculate_losses: 27.0488 losses_init: 0.0105, forward_head: 1.3187, bptt_initial: 18.3050, tail: 1.1676, advantages_returns: 0.2265, losses: 3.8963 bptt: 1.8337 bptt_forward_core: 1.7480 update: 12.1269 clip: 0.9232 [2024-12-30 00:48:30,117][01374] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3145, enqueue_policy_requests: 104.0427, env_step: 824.9565, overhead: 12.7915, complete_rollouts: 6.4111 save_policy_outputs: 21.1756 split_output_tensors: 8.3533 [2024-12-30 00:48:30,119][01374] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3459, enqueue_policy_requests: 101.6828, env_step: 821.1953, overhead: 13.2790, complete_rollouts: 7.2932 save_policy_outputs: 20.6409 split_output_tensors: 8.1365 [2024-12-30 00:48:30,120][01374] Loop Runner_EvtLoop terminating... [2024-12-30 00:48:30,121][01374] Runner profile tree view: main_loop: 1083.3138 [2024-12-30 00:48:30,125][01374] Collected {0: 4005888}, FPS: 3697.8 [2024-12-30 00:49:03,469][01374] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-30 00:49:03,471][01374] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-30 00:49:03,473][01374] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-30 00:49:03,475][01374] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-30 00:49:03,477][01374] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-30 00:49:03,479][01374] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-30 00:49:03,480][01374] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-30 00:49:03,482][01374] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-30 00:49:03,483][01374] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-30 00:49:03,484][01374] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-30 00:49:03,485][01374] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-30 00:49:03,486][01374] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-30 00:49:03,487][01374] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-30 00:49:03,489][01374] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-30 00:49:03,490][01374] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-30 00:49:03,524][01374] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 00:49:03,528][01374] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 00:49:03,530][01374] RunningMeanStd input shape: (1,) [2024-12-30 00:49:03,547][01374] ConvEncoder: input_channels=3 [2024-12-30 00:49:03,673][01374] Conv encoder output size: 512 [2024-12-30 00:49:03,675][01374] Policy head output size: 512 [2024-12-30 00:49:03,855][01374] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 00:49:04,843][01374] Num frames 100... [2024-12-30 00:49:05,010][01374] Num frames 200... [2024-12-30 00:49:05,171][01374] Num frames 300... [2024-12-30 00:49:05,335][01374] Num frames 400... [2024-12-30 00:49:05,503][01374] Num frames 500... [2024-12-30 00:49:05,672][01374] Num frames 600... [2024-12-30 00:49:05,841][01374] Num frames 700... [2024-12-30 00:49:06,014][01374] Num frames 800... [2024-12-30 00:49:06,189][01374] Num frames 900... [2024-12-30 00:49:06,371][01374] Num frames 1000... [2024-12-30 00:49:06,539][01374] Num frames 1100... [2024-12-30 00:49:06,722][01374] Num frames 1200... [2024-12-30 00:49:06,865][01374] Avg episode rewards: #0: 28.480, true rewards: #0: 12.480 [2024-12-30 00:49:06,867][01374] Avg episode reward: 28.480, avg true_objective: 12.480 [2024-12-30 00:49:06,934][01374] Num frames 1300... [2024-12-30 00:49:07,066][01374] Num frames 1400... [2024-12-30 00:49:07,189][01374] Num frames 1500... [2024-12-30 00:49:07,311][01374] Num frames 1600... [2024-12-30 00:49:07,432][01374] Num frames 1700... [2024-12-30 00:49:07,554][01374] Num frames 1800... [2024-12-30 00:49:07,673][01374] Num frames 1900... [2024-12-30 00:49:07,752][01374] Avg episode rewards: #0: 22.100, true rewards: #0: 9.600 [2024-12-30 00:49:07,754][01374] Avg episode reward: 22.100, avg true_objective: 9.600 [2024-12-30 00:49:07,853][01374] Num frames 2000... [2024-12-30 00:49:07,978][01374] Num frames 2100... [2024-12-30 00:49:08,097][01374] Num frames 2200... [2024-12-30 00:49:08,221][01374] Num frames 2300... [2024-12-30 00:49:08,321][01374] Avg episode rewards: #0: 16.787, true rewards: #0: 7.787 [2024-12-30 00:49:08,322][01374] Avg episode reward: 16.787, avg true_objective: 7.787 [2024-12-30 00:49:08,403][01374] Num frames 2400... [2024-12-30 00:49:08,524][01374] Num frames 2500... [2024-12-30 00:49:08,643][01374] Num frames 2600... [2024-12-30 00:49:08,779][01374] Num frames 2700... [2024-12-30 00:49:08,900][01374] Num frames 2800... [2024-12-30 00:49:09,026][01374] Num frames 2900... [2024-12-30 00:49:09,144][01374] Num frames 3000... [2024-12-30 00:49:09,262][01374] Num frames 3100... [2024-12-30 00:49:09,388][01374] Num frames 3200... [2024-12-30 00:49:09,504][01374] Num frames 3300... [2024-12-30 00:49:09,621][01374] Num frames 3400... [2024-12-30 00:49:09,736][01374] Num frames 3500... [2024-12-30 00:49:09,869][01374] Num frames 3600... [2024-12-30 00:49:09,993][01374] Num frames 3700... [2024-12-30 00:49:10,116][01374] Num frames 3800... [2024-12-30 00:49:10,236][01374] Num frames 3900... [2024-12-30 00:49:10,355][01374] Num frames 4000... [2024-12-30 00:49:10,480][01374] Num frames 4100... [2024-12-30 00:49:10,570][01374] Avg episode rewards: #0: 23.570, true rewards: #0: 10.320 [2024-12-30 00:49:10,571][01374] Avg episode reward: 23.570, avg true_objective: 10.320 [2024-12-30 00:49:10,658][01374] Num frames 4200... [2024-12-30 00:49:10,776][01374] Num frames 4300... [2024-12-30 00:49:10,901][01374] Num frames 4400... [2024-12-30 00:49:11,030][01374] Num frames 4500... [2024-12-30 00:49:11,148][01374] Num frames 4600... [2024-12-30 00:49:11,266][01374] Num frames 4700... [2024-12-30 00:49:11,363][01374] Avg episode rewards: #0: 20.672, true rewards: #0: 9.472 [2024-12-30 00:49:11,364][01374] Avg episode reward: 20.672, avg true_objective: 9.472 [2024-12-30 00:49:11,445][01374] Num frames 4800... [2024-12-30 00:49:11,562][01374] Num frames 4900... [2024-12-30 00:49:11,682][01374] Num frames 5000... [2024-12-30 00:49:11,803][01374] Num frames 5100... [2024-12-30 00:49:11,928][01374] Num frames 5200... [2024-12-30 00:49:12,053][01374] Num frames 5300... [2024-12-30 00:49:12,169][01374] Num frames 5400... [2024-12-30 00:49:12,290][01374] Num frames 5500... [2024-12-30 00:49:12,407][01374] Num frames 5600... [2024-12-30 00:49:12,523][01374] Num frames 5700... [2024-12-30 00:49:12,683][01374] Avg episode rewards: #0: 21.153, true rewards: #0: 9.653 [2024-12-30 00:49:12,685][01374] Avg episode reward: 21.153, avg true_objective: 9.653 [2024-12-30 00:49:12,698][01374] Num frames 5800... [2024-12-30 00:49:12,826][01374] Num frames 5900... [2024-12-30 00:49:12,952][01374] Num frames 6000... [2024-12-30 00:49:13,075][01374] Num frames 6100... [2024-12-30 00:49:13,192][01374] Num frames 6200... [2024-12-30 00:49:13,313][01374] Num frames 6300... [2024-12-30 00:49:13,434][01374] Num frames 6400... [2024-12-30 00:49:13,556][01374] Num frames 6500... [2024-12-30 00:49:13,672][01374] Num frames 6600... [2024-12-30 00:49:13,790][01374] Num frames 6700... [2024-12-30 00:49:13,919][01374] Num frames 6800... [2024-12-30 00:49:14,043][01374] Num frames 6900... [2024-12-30 00:49:14,160][01374] Num frames 7000... [2024-12-30 00:49:14,279][01374] Num frames 7100... [2024-12-30 00:49:14,408][01374] Num frames 7200... [2024-12-30 00:49:14,541][01374] Avg episode rewards: #0: 23.377, true rewards: #0: 10.377 [2024-12-30 00:49:14,542][01374] Avg episode reward: 23.377, avg true_objective: 10.377 [2024-12-30 00:49:14,591][01374] Num frames 7300... [2024-12-30 00:49:14,706][01374] Num frames 7400... [2024-12-30 00:49:14,825][01374] Num frames 7500... [2024-12-30 00:49:14,954][01374] Num frames 7600... [2024-12-30 00:49:15,081][01374] Num frames 7700... [2024-12-30 00:49:15,203][01374] Num frames 7800... [2024-12-30 00:49:15,322][01374] Num frames 7900... [2024-12-30 00:49:15,445][01374] Num frames 8000... [2024-12-30 00:49:15,564][01374] Num frames 8100... [2024-12-30 00:49:15,680][01374] Num frames 8200... [2024-12-30 00:49:15,804][01374] Num frames 8300... [2024-12-30 00:49:15,923][01374] Num frames 8400... [2024-12-30 00:49:16,057][01374] Num frames 8500... [2024-12-30 00:49:16,129][01374] Avg episode rewards: #0: 24.766, true rewards: #0: 10.641 [2024-12-30 00:49:16,131][01374] Avg episode reward: 24.766, avg true_objective: 10.641 [2024-12-30 00:49:16,235][01374] Num frames 8600... [2024-12-30 00:49:16,354][01374] Num frames 8700... [2024-12-30 00:49:16,476][01374] Num frames 8800... [2024-12-30 00:49:16,597][01374] Num frames 8900... [2024-12-30 00:49:16,717][01374] Num frames 9000... [2024-12-30 00:49:16,854][01374] Num frames 9100... [2024-12-30 00:49:17,037][01374] Num frames 9200... [2024-12-30 00:49:17,204][01374] Num frames 9300... [2024-12-30 00:49:17,376][01374] Num frames 9400... [2024-12-30 00:49:17,541][01374] Num frames 9500... [2024-12-30 00:49:17,704][01374] Num frames 9600... [2024-12-30 00:49:17,864][01374] Num frames 9700... [2024-12-30 00:49:18,033][01374] Num frames 9800... [2024-12-30 00:49:18,201][01374] Num frames 9900... [2024-12-30 00:49:18,403][01374] Avg episode rewards: #0: 25.872, true rewards: #0: 11.094 [2024-12-30 00:49:18,406][01374] Avg episode reward: 25.872, avg true_objective: 11.094 [2024-12-30 00:49:18,435][01374] Num frames 10000... [2024-12-30 00:49:18,603][01374] Num frames 10100... [2024-12-30 00:49:18,784][01374] Num frames 10200... [2024-12-30 00:49:18,963][01374] Num frames 10300... [2024-12-30 00:49:19,146][01374] Num frames 10400... [2024-12-30 00:49:19,322][01374] Num frames 10500... [2024-12-30 00:49:19,448][01374] Num frames 10600... [2024-12-30 00:49:19,566][01374] Num frames 10700... [2024-12-30 00:49:19,682][01374] Num frames 10800... [2024-12-30 00:49:19,807][01374] Num frames 10900... [2024-12-30 00:49:19,952][01374] Avg episode rewards: #0: 25.477, true rewards: #0: 10.977 [2024-12-30 00:49:19,953][01374] Avg episode reward: 25.477, avg true_objective: 10.977 [2024-12-30 00:50:28,014][01374] Replay video saved to /content/train_dir/default_experiment/replay.mp4!