[2025-01-10 10:36:13,741][01485] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-10 10:36:13,744][01485] Rollout worker 0 uses device cpu [2025-01-10 10:36:13,746][01485] Rollout worker 1 uses device cpu [2025-01-10 10:36:13,747][01485] Rollout worker 2 uses device cpu [2025-01-10 10:36:13,748][01485] Rollout worker 3 uses device cpu [2025-01-10 10:36:13,749][01485] Rollout worker 4 uses device cpu [2025-01-10 10:36:13,750][01485] Rollout worker 5 uses device cpu [2025-01-10 10:36:13,751][01485] Rollout worker 6 uses device cpu [2025-01-10 10:36:13,752][01485] Rollout worker 7 uses device cpu [2025-01-10 10:36:13,894][01485] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-10 10:36:13,895][01485] InferenceWorker_p0-w0: min num requests: 2 [2025-01-10 10:36:13,927][01485] Starting all processes... [2025-01-10 10:36:13,928][01485] Starting process learner_proc0 [2025-01-10 10:36:13,972][01485] Starting all processes... [2025-01-10 10:36:13,978][01485] Starting process inference_proc0-0 [2025-01-10 10:36:13,979][01485] Starting process rollout_proc0 [2025-01-10 10:36:13,980][01485] Starting process rollout_proc1 [2025-01-10 10:36:13,980][01485] Starting process rollout_proc2 [2025-01-10 10:36:13,981][01485] Starting process rollout_proc3 [2025-01-10 10:36:13,981][01485] Starting process rollout_proc4 [2025-01-10 10:36:13,981][01485] Starting process rollout_proc5 [2025-01-10 10:36:13,981][01485] Starting process rollout_proc6 [2025-01-10 10:36:13,981][01485] Starting process rollout_proc7 [2025-01-10 10:36:30,335][03568] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-10 10:36:30,335][03568] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-10 10:36:30,511][03568] Num visible devices: 1 [2025-01-10 10:36:31,023][03569] Worker 0 uses CPU cores [0] [2025-01-10 10:36:31,067][03575] Worker 7 uses CPU cores [1] [2025-01-10 10:36:31,328][03571] Worker 2 uses CPU cores [0] [2025-01-10 10:36:31,348][03570] Worker 1 uses CPU cores [1] [2025-01-10 10:36:31,365][03555] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-10 10:36:31,365][03555] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-10 10:36:31,426][03555] Num visible devices: 1 [2025-01-10 10:36:31,436][03572] Worker 3 uses CPU cores [1] [2025-01-10 10:36:31,437][03573] Worker 5 uses CPU cores [1] [2025-01-10 10:36:31,450][03555] Starting seed is not provided [2025-01-10 10:36:31,451][03555] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-10 10:36:31,451][03555] Initializing actor-critic model on device cuda:0 [2025-01-10 10:36:31,452][03555] RunningMeanStd input shape: (3, 72, 128) [2025-01-10 10:36:31,457][03555] RunningMeanStd input shape: (1,) [2025-01-10 10:36:31,513][03555] ConvEncoder: input_channels=3 [2025-01-10 10:36:31,514][03574] Worker 4 uses CPU cores [0] [2025-01-10 10:36:31,569][03576] Worker 6 uses CPU cores [0] [2025-01-10 10:36:31,849][03555] Conv encoder output size: 512 [2025-01-10 10:36:31,850][03555] Policy head output size: 512 [2025-01-10 10:36:31,917][03555] Created Actor Critic model with architecture: [2025-01-10 10:36:31,917][03555] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-10 10:36:32,240][03555] Using optimizer [2025-01-10 10:36:33,890][01485] Heartbeat connected on Batcher_0 [2025-01-10 10:36:33,896][01485] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-10 10:36:33,902][01485] Heartbeat connected on RolloutWorker_w0 [2025-01-10 10:36:33,907][01485] Heartbeat connected on RolloutWorker_w1 [2025-01-10 10:36:33,910][01485] Heartbeat connected on RolloutWorker_w2 [2025-01-10 10:36:33,913][01485] Heartbeat connected on RolloutWorker_w3 [2025-01-10 10:36:33,916][01485] Heartbeat connected on RolloutWorker_w4 [2025-01-10 10:36:33,919][01485] Heartbeat connected on RolloutWorker_w5 [2025-01-10 10:36:33,923][01485] Heartbeat connected on RolloutWorker_w6 [2025-01-10 10:36:33,926][01485] Heartbeat connected on RolloutWorker_w7 [2025-01-10 10:36:35,859][03555] No checkpoints found [2025-01-10 10:36:35,859][03555] Did not load from checkpoint, starting from scratch! [2025-01-10 10:36:35,859][03555] Initialized policy 0 weights for model version 0 [2025-01-10 10:36:35,863][03555] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-10 10:36:35,869][03555] LearnerWorker_p0 finished initialization! [2025-01-10 10:36:35,871][01485] Heartbeat connected on LearnerWorker_p0 [2025-01-10 10:36:35,959][03568] RunningMeanStd input shape: (3, 72, 128) [2025-01-10 10:36:35,961][03568] RunningMeanStd input shape: (1,) [2025-01-10 10:36:35,973][03568] ConvEncoder: input_channels=3 [2025-01-10 10:36:36,079][03568] Conv encoder output size: 512 [2025-01-10 10:36:36,079][03568] Policy head output size: 512 [2025-01-10 10:36:36,131][01485] Inference worker 0-0 is ready! [2025-01-10 10:36:36,133][01485] All inference workers are ready! Signal rollout workers to start! [2025-01-10 10:36:36,329][03574] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,325][03573] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,334][03569] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,332][03572] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,335][03576] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,331][03571] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,331][03570] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,338][03575] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 10:36:36,969][03575] Decorrelating experience for 0 frames... [2025-01-10 10:36:37,375][03575] Decorrelating experience for 32 frames... [2025-01-10 10:36:37,566][03571] Decorrelating experience for 0 frames... [2025-01-10 10:36:37,568][03569] Decorrelating experience for 0 frames... [2025-01-10 10:36:37,579][03574] Decorrelating experience for 0 frames... [2025-01-10 10:36:38,013][03570] Decorrelating experience for 0 frames... [2025-01-10 10:36:38,617][03571] Decorrelating experience for 32 frames... [2025-01-10 10:36:38,613][03576] Decorrelating experience for 0 frames... [2025-01-10 10:36:38,639][03574] Decorrelating experience for 32 frames... [2025-01-10 10:36:38,743][03570] Decorrelating experience for 32 frames... [2025-01-10 10:36:38,789][03572] Decorrelating experience for 0 frames... [2025-01-10 10:36:38,946][01485] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-10 10:36:39,900][03576] Decorrelating experience for 32 frames... [2025-01-10 10:36:39,922][03569] Decorrelating experience for 32 frames... [2025-01-10 10:36:39,960][03573] Decorrelating experience for 0 frames... [2025-01-10 10:36:39,962][03572] Decorrelating experience for 32 frames... [2025-01-10 10:36:40,215][03571] Decorrelating experience for 64 frames... [2025-01-10 10:36:40,349][03570] Decorrelating experience for 64 frames... [2025-01-10 10:36:41,028][03576] Decorrelating experience for 64 frames... [2025-01-10 10:36:41,049][03569] Decorrelating experience for 64 frames... [2025-01-10 10:36:41,280][03575] Decorrelating experience for 64 frames... [2025-01-10 10:36:41,307][03573] Decorrelating experience for 32 frames... [2025-01-10 10:36:41,820][03570] Decorrelating experience for 96 frames... [2025-01-10 10:36:42,126][03571] Decorrelating experience for 96 frames... [2025-01-10 10:36:42,228][03576] Decorrelating experience for 96 frames... [2025-01-10 10:36:42,478][03574] Decorrelating experience for 64 frames... [2025-01-10 10:36:42,802][03572] Decorrelating experience for 64 frames... [2025-01-10 10:36:43,072][03569] Decorrelating experience for 96 frames... [2025-01-10 10:36:43,093][03575] Decorrelating experience for 96 frames... [2025-01-10 10:36:43,322][03573] Decorrelating experience for 64 frames... [2025-01-10 10:36:43,946][01485] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-10 10:36:44,158][03572] Decorrelating experience for 96 frames... [2025-01-10 10:36:45,438][03574] Decorrelating experience for 96 frames... [2025-01-10 10:36:45,651][03573] Decorrelating experience for 96 frames... [2025-01-10 10:36:48,946][01485] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 211.6. Samples: 2116. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-10 10:36:48,957][01485] Avg episode reward: [(0, '1.815')] [2025-01-10 10:36:49,196][03555] Signal inference workers to stop experience collection... [2025-01-10 10:36:49,217][03568] InferenceWorker_p0-w0: stopping experience collection [2025-01-10 10:36:51,875][03555] Signal inference workers to resume experience collection... [2025-01-10 10:36:51,876][03568] InferenceWorker_p0-w0: resuming experience collection [2025-01-10 10:36:53,946][01485] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 173.2. Samples: 2598. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-10 10:36:53,952][01485] Avg episode reward: [(0, '2.742')] [2025-01-10 10:36:58,946][01485] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 333.2. Samples: 6664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:36:58,953][01485] Avg episode reward: [(0, '3.767')] [2025-01-10 10:37:01,328][03568] Updated weights for policy 0, policy_version 10 (0.0036) [2025-01-10 10:37:03,946][01485] Fps is (10 sec: 3276.8, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 488.0. Samples: 12200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:37:03,950][01485] Avg episode reward: [(0, '4.360')] [2025-01-10 10:37:08,946][01485] Fps is (10 sec: 3686.4, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 482.9. Samples: 14486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:37:08,953][01485] Avg episode reward: [(0, '4.359')] [2025-01-10 10:37:11,726][03568] Updated weights for policy 0, policy_version 20 (0.0029) [2025-01-10 10:37:13,946][01485] Fps is (10 sec: 4505.6, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 612.9. Samples: 21452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:37:13,948][01485] Avg episode reward: [(0, '4.328')] [2025-01-10 10:37:18,946][01485] Fps is (10 sec: 4505.6, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 702.1. Samples: 28086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:37:18,950][01485] Avg episode reward: [(0, '4.337')] [2025-01-10 10:37:18,960][03555] Saving new best policy, reward=4.337! [2025-01-10 10:37:22,811][03568] Updated weights for policy 0, policy_version 30 (0.0020) [2025-01-10 10:37:23,946][01485] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 670.2. Samples: 30158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:37:23,952][01485] Avg episode reward: [(0, '4.563')] [2025-01-10 10:37:23,955][03555] Saving new best policy, reward=4.563! [2025-01-10 10:37:28,946][01485] Fps is (10 sec: 3686.4, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 798.0. Samples: 35912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:37:28,953][01485] Avg episode reward: [(0, '4.602')] [2025-01-10 10:37:28,962][03555] Saving new best policy, reward=4.602! [2025-01-10 10:37:31,910][03568] Updated weights for policy 0, policy_version 40 (0.0026) [2025-01-10 10:37:33,947][01485] Fps is (10 sec: 4914.8, 60 sec: 3127.8, 300 sec: 3127.8). Total num frames: 172032. Throughput: 0: 913.1. Samples: 43204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:37:33,952][01485] Avg episode reward: [(0, '4.414')] [2025-01-10 10:37:38,946][01485] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 959.8. Samples: 45788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:37:38,948][01485] Avg episode reward: [(0, '4.484')] [2025-01-10 10:37:42,820][03568] Updated weights for policy 0, policy_version 50 (0.0027) [2025-01-10 10:37:43,946][01485] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 208896. Throughput: 0: 979.7. Samples: 50750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:37:43,951][01485] Avg episode reward: [(0, '4.382')] [2025-01-10 10:37:48,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 1018.0. Samples: 58010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:37:48,949][01485] Avg episode reward: [(0, '4.337')] [2025-01-10 10:37:51,600][03568] Updated weights for policy 0, policy_version 60 (0.0028) [2025-01-10 10:37:53,947][01485] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 1045.4. Samples: 61528. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:37:53,949][01485] Avg episode reward: [(0, '4.406')] [2025-01-10 10:37:58,946][01485] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 987.1. Samples: 65872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-01-10 10:37:58,949][01485] Avg episode reward: [(0, '4.429')] [2025-01-10 10:38:02,718][03568] Updated weights for policy 0, policy_version 70 (0.0020) [2025-01-10 10:38:03,946][01485] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3421.4). Total num frames: 290816. Throughput: 0: 994.2. Samples: 72824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:38:03,948][01485] Avg episode reward: [(0, '4.511')] [2025-01-10 10:38:08,946][01485] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3458.8). Total num frames: 311296. Throughput: 0: 1029.0. Samples: 76464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:38:08,952][01485] Avg episode reward: [(0, '4.275')] [2025-01-10 10:38:08,983][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... [2025-01-10 10:38:13,324][03568] Updated weights for policy 0, policy_version 80 (0.0033) [2025-01-10 10:38:13,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3449.3). Total num frames: 327680. Throughput: 0: 1016.6. Samples: 81660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:38:13,948][01485] Avg episode reward: [(0, '4.307')] [2025-01-10 10:38:18,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 992.1. Samples: 87846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:38:18,949][01485] Avg episode reward: [(0, '4.334')] [2025-01-10 10:38:22,074][03568] Updated weights for policy 0, policy_version 90 (0.0025) [2025-01-10 10:38:23,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3588.9). Total num frames: 376832. Throughput: 0: 1015.8. Samples: 91500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:38:23,950][01485] Avg episode reward: [(0, '4.496')] [2025-01-10 10:38:28,948][01485] Fps is (10 sec: 4095.2, 60 sec: 4095.9, 300 sec: 3574.6). Total num frames: 393216. Throughput: 0: 1041.0. Samples: 97598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:38:28,953][01485] Avg episode reward: [(0, '4.718')] [2025-01-10 10:38:28,958][03555] Saving new best policy, reward=4.718! [2025-01-10 10:38:33,453][03568] Updated weights for policy 0, policy_version 100 (0.0018) [2025-01-10 10:38:33,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3561.7). Total num frames: 409600. Throughput: 0: 989.7. Samples: 102546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:38:33,948][01485] Avg episode reward: [(0, '4.528')] [2025-01-10 10:38:38,946][01485] Fps is (10 sec: 4096.8, 60 sec: 4096.0, 300 sec: 3618.1). Total num frames: 434176. Throughput: 0: 991.5. Samples: 106144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:38:38,953][01485] Avg episode reward: [(0, '4.500')] [2025-01-10 10:38:41,840][03568] Updated weights for policy 0, policy_version 110 (0.0026) [2025-01-10 10:38:43,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3637.2). Total num frames: 454656. Throughput: 0: 1054.4. Samples: 113318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:38:43,951][01485] Avg episode reward: [(0, '4.670')] [2025-01-10 10:38:48,947][01485] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3623.4). Total num frames: 471040. Throughput: 0: 998.9. Samples: 117774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:38:48,950][01485] Avg episode reward: [(0, '4.816')] [2025-01-10 10:38:48,960][03555] Saving new best policy, reward=4.816! [2025-01-10 10:38:53,065][03568] Updated weights for policy 0, policy_version 120 (0.0026) [2025-01-10 10:38:53,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3671.2). Total num frames: 495616. Throughput: 0: 988.0. Samples: 120922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:38:53,950][01485] Avg episode reward: [(0, '4.808')] [2025-01-10 10:38:58,946][01485] Fps is (10 sec: 4506.1, 60 sec: 4164.3, 300 sec: 3686.4). Total num frames: 516096. Throughput: 0: 1036.2. Samples: 128288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:38:58,954][01485] Avg episode reward: [(0, '4.816')] [2025-01-10 10:39:03,336][03568] Updated weights for policy 0, policy_version 130 (0.0029) [2025-01-10 10:39:03,949][01485] Fps is (10 sec: 3685.1, 60 sec: 4027.5, 300 sec: 3672.2). Total num frames: 532480. Throughput: 0: 1012.8. Samples: 133426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:39:03,952][01485] Avg episode reward: [(0, '5.065')] [2025-01-10 10:39:03,956][03555] Saving new best policy, reward=5.065! [2025-01-10 10:39:08,947][01485] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 552960. Throughput: 0: 984.2. Samples: 135788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:39:08,948][01485] Avg episode reward: [(0, '5.057')] [2025-01-10 10:39:12,742][03568] Updated weights for policy 0, policy_version 140 (0.0022) [2025-01-10 10:39:13,946][01485] Fps is (10 sec: 4507.1, 60 sec: 4164.3, 300 sec: 3726.0). Total num frames: 577536. Throughput: 0: 1010.8. Samples: 143084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:39:13,951][01485] Avg episode reward: [(0, '4.796')] [2025-01-10 10:39:18,946][01485] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3712.0). Total num frames: 593920. Throughput: 0: 1036.7. Samples: 149198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:39:18,948][01485] Avg episode reward: [(0, '4.613')] [2025-01-10 10:39:23,946][01485] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3698.8). Total num frames: 610304. Throughput: 0: 1003.2. Samples: 151286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:39:23,951][01485] Avg episode reward: [(0, '4.800')] [2025-01-10 10:39:24,087][03568] Updated weights for policy 0, policy_version 150 (0.0035) [2025-01-10 10:39:28,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3734.6). Total num frames: 634880. Throughput: 0: 992.4. Samples: 157976. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:39:28,949][01485] Avg episode reward: [(0, '5.189')] [2025-01-10 10:39:28,961][03555] Saving new best policy, reward=5.189! [2025-01-10 10:39:32,446][03568] Updated weights for policy 0, policy_version 160 (0.0019) [2025-01-10 10:39:33,946][01485] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 3768.3). Total num frames: 659456. Throughput: 0: 1046.4. Samples: 164862. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:39:33,951][01485] Avg episode reward: [(0, '5.016')] [2025-01-10 10:39:38,947][01485] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3731.9). Total num frames: 671744. Throughput: 0: 1024.1. Samples: 167008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:39:38,950][01485] Avg episode reward: [(0, '5.050')] [2025-01-10 10:39:43,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3741.8). Total num frames: 692224. Throughput: 0: 980.4. Samples: 172408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:39:43,948][01485] Avg episode reward: [(0, '5.079')] [2025-01-10 10:39:44,992][03568] Updated weights for policy 0, policy_version 170 (0.0030) [2025-01-10 10:39:48,946][01485] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3729.5). Total num frames: 708608. Throughput: 0: 964.3. Samples: 176816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:39:48,952][01485] Avg episode reward: [(0, '5.056')] [2025-01-10 10:39:53,947][01485] Fps is (10 sec: 2866.9, 60 sec: 3754.6, 300 sec: 3696.9). Total num frames: 720896. Throughput: 0: 972.5. Samples: 179552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:39:53,956][01485] Avg episode reward: [(0, '5.445')] [2025-01-10 10:39:53,962][03555] Saving new best policy, reward=5.445! [2025-01-10 10:39:57,605][03568] Updated weights for policy 0, policy_version 180 (0.0040) [2025-01-10 10:39:58,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3706.9). Total num frames: 741376. Throughput: 0: 917.6. Samples: 184378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:39:58,952][01485] Avg episode reward: [(0, '5.801')] [2025-01-10 10:39:58,959][03555] Saving new best policy, reward=5.801! [2025-01-10 10:40:03,946][01485] Fps is (10 sec: 4506.1, 60 sec: 3891.4, 300 sec: 3736.4). Total num frames: 765952. Throughput: 0: 942.4. Samples: 191608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:03,952][01485] Avg episode reward: [(0, '5.831')] [2025-01-10 10:40:03,954][03555] Saving new best policy, reward=5.831! [2025-01-10 10:40:06,024][03568] Updated weights for policy 0, policy_version 190 (0.0026) [2025-01-10 10:40:08,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 977.2. Samples: 195258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:40:08,951][01485] Avg episode reward: [(0, '5.883')] [2025-01-10 10:40:08,963][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... [2025-01-10 10:40:09,138][03555] Saving new best policy, reward=5.883! [2025-01-10 10:40:13,950][01485] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3734.0). Total num frames: 802816. Throughput: 0: 924.9. Samples: 199602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:13,955][01485] Avg episode reward: [(0, '6.129')] [2025-01-10 10:40:13,957][03555] Saving new best policy, reward=6.129! [2025-01-10 10:40:17,301][03568] Updated weights for policy 0, policy_version 200 (0.0019) [2025-01-10 10:40:18,947][01485] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3742.2). Total num frames: 823296. Throughput: 0: 924.1. Samples: 206446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:40:18,952][01485] Avg episode reward: [(0, '6.439')] [2025-01-10 10:40:18,969][03555] Saving new best policy, reward=6.439! [2025-01-10 10:40:23,946][01485] Fps is (10 sec: 4507.3, 60 sec: 3959.5, 300 sec: 3768.3). Total num frames: 847872. Throughput: 0: 956.4. Samples: 210044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:23,950][01485] Avg episode reward: [(0, '6.292')] [2025-01-10 10:40:27,254][03568] Updated weights for policy 0, policy_version 210 (0.0023) [2025-01-10 10:40:28,946][01485] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3757.6). Total num frames: 864256. Throughput: 0: 957.5. Samples: 215498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:40:28,952][01485] Avg episode reward: [(0, '6.328')] [2025-01-10 10:40:33,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3764.8). Total num frames: 884736. Throughput: 0: 985.2. Samples: 221150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:33,954][01485] Avg episode reward: [(0, '6.806')] [2025-01-10 10:40:33,957][03555] Saving new best policy, reward=6.806! [2025-01-10 10:40:37,068][03568] Updated weights for policy 0, policy_version 220 (0.0020) [2025-01-10 10:40:38,946][01485] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3788.8). Total num frames: 909312. Throughput: 0: 1003.0. Samples: 224688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:40:38,951][01485] Avg episode reward: [(0, '6.865')] [2025-01-10 10:40:38,967][03555] Saving new best policy, reward=6.865! [2025-01-10 10:40:43,946][01485] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3778.4). Total num frames: 925696. Throughput: 0: 1034.6. Samples: 230934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:43,955][01485] Avg episode reward: [(0, '6.697')] [2025-01-10 10:40:48,390][03568] Updated weights for policy 0, policy_version 230 (0.0016) [2025-01-10 10:40:48,947][01485] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3768.3). Total num frames: 942080. Throughput: 0: 984.3. Samples: 235902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:48,953][01485] Avg episode reward: [(0, '7.403')] [2025-01-10 10:40:48,962][03555] Saving new best policy, reward=7.403! [2025-01-10 10:40:53,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3790.8). Total num frames: 966656. Throughput: 0: 982.2. Samples: 239458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:40:53,952][01485] Avg episode reward: [(0, '7.089')] [2025-01-10 10:40:56,853][03568] Updated weights for policy 0, policy_version 240 (0.0017) [2025-01-10 10:40:58,947][01485] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3796.7). Total num frames: 987136. Throughput: 0: 1047.2. Samples: 246722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:40:58,953][01485] Avg episode reward: [(0, '6.681')] [2025-01-10 10:41:03,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3786.9). Total num frames: 1003520. Throughput: 0: 993.6. Samples: 251156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:41:03,952][01485] Avg episode reward: [(0, '7.065')] [2025-01-10 10:41:07,788][03568] Updated weights for policy 0, policy_version 250 (0.0031) [2025-01-10 10:41:08,946][01485] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 3807.8). Total num frames: 1028096. Throughput: 0: 987.5. Samples: 254482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:41:08,953][01485] Avg episode reward: [(0, '7.387')] [2025-01-10 10:41:13,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.5, 300 sec: 3827.9). Total num frames: 1052672. Throughput: 0: 1030.0. Samples: 261848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:41:13,950][01485] Avg episode reward: [(0, '7.489')] [2025-01-10 10:41:13,954][03555] Saving new best policy, reward=7.489! [2025-01-10 10:41:17,857][03568] Updated weights for policy 0, policy_version 260 (0.0027) [2025-01-10 10:41:18,951][01485] Fps is (10 sec: 3684.8, 60 sec: 4027.5, 300 sec: 3803.4). Total num frames: 1064960. Throughput: 0: 1017.3. Samples: 266932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:41:18,953][01485] Avg episode reward: [(0, '8.112')] [2025-01-10 10:41:18,963][03555] Saving new best policy, reward=8.112! [2025-01-10 10:41:23,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3808.6). Total num frames: 1085440. Throughput: 0: 991.4. Samples: 269300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:41:23,953][01485] Avg episode reward: [(0, '8.281')] [2025-01-10 10:41:23,956][03555] Saving new best policy, reward=8.281! [2025-01-10 10:41:27,633][03568] Updated weights for policy 0, policy_version 270 (0.0014) [2025-01-10 10:41:28,946][01485] Fps is (10 sec: 4507.6, 60 sec: 4096.0, 300 sec: 3827.6). Total num frames: 1110016. Throughput: 0: 1012.4. Samples: 276494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:41:28,948][01485] Avg episode reward: [(0, '9.270')] [2025-01-10 10:41:28,960][03555] Saving new best policy, reward=9.270! [2025-01-10 10:41:33,948][01485] Fps is (10 sec: 4095.1, 60 sec: 4027.6, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 1031.3. Samples: 282312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:41:33,951][01485] Avg episode reward: [(0, '8.963')] [2025-01-10 10:41:38,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 998.5. Samples: 284390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:41:38,948][01485] Avg episode reward: [(0, '9.814')] [2025-01-10 10:41:39,008][03555] Saving new best policy, reward=9.814! [2025-01-10 10:41:39,017][03568] Updated weights for policy 0, policy_version 280 (0.0021) [2025-01-10 10:41:43,946][01485] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 984.7. Samples: 291032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:41:43,948][01485] Avg episode reward: [(0, '9.776')] [2025-01-10 10:41:47,684][03568] Updated weights for policy 0, policy_version 290 (0.0030) [2025-01-10 10:41:48,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1187840. Throughput: 0: 1037.1. Samples: 297824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:41:48,950][01485] Avg episode reward: [(0, '10.067')] [2025-01-10 10:41:49,042][03555] Saving new best policy, reward=10.067! [2025-01-10 10:41:53,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1204224. Throughput: 0: 1009.5. Samples: 299908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:41:53,951][01485] Avg episode reward: [(0, '9.386')] [2025-01-10 10:41:58,881][03568] Updated weights for policy 0, policy_version 300 (0.0020) [2025-01-10 10:41:58,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 1228800. Throughput: 0: 971.6. Samples: 305572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:41:58,952][01485] Avg episode reward: [(0, '9.788')] [2025-01-10 10:42:03,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1249280. Throughput: 0: 1018.8. Samples: 312772. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:42:03,948][01485] Avg episode reward: [(0, '9.586')] [2025-01-10 10:42:08,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1265664. Throughput: 0: 1030.9. Samples: 315690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:42:08,952][01485] Avg episode reward: [(0, '10.143')] [2025-01-10 10:42:08,967][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth... [2025-01-10 10:42:09,138][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth [2025-01-10 10:42:09,161][03555] Saving new best policy, reward=10.143! [2025-01-10 10:42:09,442][03568] Updated weights for policy 0, policy_version 310 (0.0035) [2025-01-10 10:42:13,946][01485] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1286144. Throughput: 0: 972.6. Samples: 320260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:42:13,948][01485] Avg episode reward: [(0, '10.260')] [2025-01-10 10:42:13,951][03555] Saving new best policy, reward=10.260! [2025-01-10 10:42:18,871][03568] Updated weights for policy 0, policy_version 320 (0.0029) [2025-01-10 10:42:18,946][01485] Fps is (10 sec: 4505.5, 60 sec: 4096.3, 300 sec: 4026.6). Total num frames: 1310720. Throughput: 0: 1003.5. Samples: 327466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:42:18,949][01485] Avg episode reward: [(0, '10.479')] [2025-01-10 10:42:18,958][03555] Saving new best policy, reward=10.479! [2025-01-10 10:42:23,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1327104. Throughput: 0: 1035.0. Samples: 330966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:42:23,949][01485] Avg episode reward: [(0, '10.410')] [2025-01-10 10:42:28,946][01485] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1343488. Throughput: 0: 984.8. Samples: 335350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:42:28,955][01485] Avg episode reward: [(0, '10.407')] [2025-01-10 10:42:30,465][03568] Updated weights for policy 0, policy_version 330 (0.0024) [2025-01-10 10:42:33,949][01485] Fps is (10 sec: 4094.8, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1368064. Throughput: 0: 979.8. Samples: 341918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:42:33,951][01485] Avg episode reward: [(0, '11.229')] [2025-01-10 10:42:33,958][03555] Saving new best policy, reward=11.229! [2025-01-10 10:42:38,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1388544. Throughput: 0: 1013.6. Samples: 345522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:42:38,956][01485] Avg episode reward: [(0, '11.399')] [2025-01-10 10:42:39,036][03555] Saving new best policy, reward=11.399! [2025-01-10 10:42:39,047][03568] Updated weights for policy 0, policy_version 340 (0.0033) [2025-01-10 10:42:43,946][01485] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1404928. Throughput: 0: 1006.4. Samples: 350862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:42:43,951][01485] Avg episode reward: [(0, '11.863')] [2025-01-10 10:42:43,955][03555] Saving new best policy, reward=11.863! [2025-01-10 10:42:48,946][01485] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1425408. Throughput: 0: 973.6. Samples: 356586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:42:48,949][01485] Avg episode reward: [(0, '11.307')] [2025-01-10 10:42:50,037][03568] Updated weights for policy 0, policy_version 350 (0.0021) [2025-01-10 10:42:53,948][01485] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 1449984. Throughput: 0: 988.9. Samples: 360192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:42:53,950][01485] Avg episode reward: [(0, '12.219')] [2025-01-10 10:42:53,956][03555] Saving new best policy, reward=12.219! [2025-01-10 10:42:58,946][01485] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1466368. Throughput: 0: 1028.6. Samples: 366548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:42:58,948][01485] Avg episode reward: [(0, '12.411')] [2025-01-10 10:42:58,961][03555] Saving new best policy, reward=12.411! [2025-01-10 10:43:00,802][03568] Updated weights for policy 0, policy_version 360 (0.0020) [2025-01-10 10:43:03,946][01485] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1486848. Throughput: 0: 976.2. Samples: 371394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:43:03,951][01485] Avg episode reward: [(0, '13.594')] [2025-01-10 10:43:03,955][03555] Saving new best policy, reward=13.594! [2025-01-10 10:43:08,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1511424. Throughput: 0: 979.5. Samples: 375042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:43:08,949][01485] Avg episode reward: [(0, '14.239')] [2025-01-10 10:43:08,962][03555] Saving new best policy, reward=14.239! [2025-01-10 10:43:09,801][03568] Updated weights for policy 0, policy_version 370 (0.0015) [2025-01-10 10:43:13,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1531904. Throughput: 0: 1044.7. Samples: 382362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:43:13,950][01485] Avg episode reward: [(0, '15.238')] [2025-01-10 10:43:13,956][03555] Saving new best policy, reward=15.238! [2025-01-10 10:43:18,947][01485] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1544192. Throughput: 0: 995.8. Samples: 386724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:43:18,952][01485] Avg episode reward: [(0, '16.329')] [2025-01-10 10:43:18,961][03555] Saving new best policy, reward=16.329! [2025-01-10 10:43:21,142][03568] Updated weights for policy 0, policy_version 380 (0.0024) [2025-01-10 10:43:23,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1568768. Throughput: 0: 981.1. Samples: 389670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:43:23,948][01485] Avg episode reward: [(0, '16.573')] [2025-01-10 10:43:23,952][03555] Saving new best policy, reward=16.573! [2025-01-10 10:43:28,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 1593344. Throughput: 0: 1022.0. Samples: 396854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:43:28,948][01485] Avg episode reward: [(0, '17.571')] [2025-01-10 10:43:28,956][03555] Saving new best policy, reward=17.571! [2025-01-10 10:43:30,069][03568] Updated weights for policy 0, policy_version 390 (0.0025) [2025-01-10 10:43:33,957][01485] Fps is (10 sec: 3682.6, 60 sec: 3959.0, 300 sec: 3970.9). Total num frames: 1605632. Throughput: 0: 1012.6. Samples: 402164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:43:33,965][01485] Avg episode reward: [(0, '16.021')] [2025-01-10 10:43:38,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1626112. Throughput: 0: 982.7. Samples: 404414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:43:38,948][01485] Avg episode reward: [(0, '15.731')] [2025-01-10 10:43:40,861][03568] Updated weights for policy 0, policy_version 400 (0.0016) [2025-01-10 10:43:43,946][01485] Fps is (10 sec: 4100.2, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1646592. Throughput: 0: 1000.0. Samples: 411548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:43:43,953][01485] Avg episode reward: [(0, '17.022')] [2025-01-10 10:43:48,947][01485] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1658880. Throughput: 0: 982.0. Samples: 415586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:43:48,949][01485] Avg episode reward: [(0, '16.700')] [2025-01-10 10:43:53,946][01485] Fps is (10 sec: 2867.3, 60 sec: 3754.8, 300 sec: 3929.4). Total num frames: 1675264. Throughput: 0: 943.9. Samples: 417516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:43:53,949][01485] Avg episode reward: [(0, '17.417')] [2025-01-10 10:43:54,675][03568] Updated weights for policy 0, policy_version 410 (0.0033) [2025-01-10 10:43:58,947][01485] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1699840. Throughput: 0: 912.5. Samples: 423424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:43:58,949][01485] Avg episode reward: [(0, '18.777')] [2025-01-10 10:43:58,971][03555] Saving new best policy, reward=18.777! [2025-01-10 10:44:03,280][03568] Updated weights for policy 0, policy_version 420 (0.0017) [2025-01-10 10:44:03,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1720320. Throughput: 0: 972.7. Samples: 430494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:03,954][01485] Avg episode reward: [(0, '17.850')] [2025-01-10 10:44:08,949][01485] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3929.4). Total num frames: 1736704. Throughput: 0: 967.9. Samples: 433226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:08,951][01485] Avg episode reward: [(0, '17.344')] [2025-01-10 10:44:08,964][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth... [2025-01-10 10:44:09,132][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth [2025-01-10 10:44:13,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1757184. Throughput: 0: 917.0. Samples: 438120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:44:13,948][01485] Avg episode reward: [(0, '17.228')] [2025-01-10 10:44:14,423][03568] Updated weights for policy 0, policy_version 430 (0.0029) [2025-01-10 10:44:18,946][01485] Fps is (10 sec: 4506.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1781760. Throughput: 0: 961.1. Samples: 445404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:44:18,948][01485] Avg episode reward: [(0, '17.359')] [2025-01-10 10:44:23,947][01485] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1798144. Throughput: 0: 988.8. Samples: 448910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:44:23,951][01485] Avg episode reward: [(0, '16.914')] [2025-01-10 10:44:24,216][03568] Updated weights for policy 0, policy_version 440 (0.0038) [2025-01-10 10:44:28,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 1814528. Throughput: 0: 925.3. Samples: 453186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:28,951][01485] Avg episode reward: [(0, '17.542')] [2025-01-10 10:44:33,948][01485] Fps is (10 sec: 4095.8, 60 sec: 3891.8, 300 sec: 3957.1). Total num frames: 1839104. Throughput: 0: 989.1. Samples: 460098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:44:33,950][01485] Avg episode reward: [(0, '18.127')] [2025-01-10 10:44:34,243][03568] Updated weights for policy 0, policy_version 450 (0.0025) [2025-01-10 10:44:38,953][01485] Fps is (10 sec: 4911.6, 60 sec: 3959.0, 300 sec: 3970.9). Total num frames: 1863680. Throughput: 0: 1025.4. Samples: 463668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:38,956][01485] Avg episode reward: [(0, '18.589')] [2025-01-10 10:44:43,946][01485] Fps is (10 sec: 3686.9, 60 sec: 3823.0, 300 sec: 3957.2). Total num frames: 1875968. Throughput: 0: 1007.8. Samples: 468774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:43,950][01485] Avg episode reward: [(0, '18.512')] [2025-01-10 10:44:45,278][03568] Updated weights for policy 0, policy_version 460 (0.0033) [2025-01-10 10:44:48,946][01485] Fps is (10 sec: 3689.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1900544. Throughput: 0: 989.4. Samples: 475016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:48,949][01485] Avg episode reward: [(0, '17.376')] [2025-01-10 10:44:53,774][03568] Updated weights for policy 0, policy_version 470 (0.0025) [2025-01-10 10:44:53,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 1925120. Throughput: 0: 1010.4. Samples: 478690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:44:53,950][01485] Avg episode reward: [(0, '16.930')] [2025-01-10 10:44:58,949][01485] Fps is (10 sec: 4095.0, 60 sec: 4027.6, 300 sec: 3984.9). Total num frames: 1941504. Throughput: 0: 1035.4. Samples: 484718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:44:58,957][01485] Avg episode reward: [(0, '17.423')] [2025-01-10 10:45:03,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1957888. Throughput: 0: 990.8. Samples: 489990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:45:03,951][01485] Avg episode reward: [(0, '16.824')] [2025-01-10 10:45:04,924][03568] Updated weights for policy 0, policy_version 480 (0.0020) [2025-01-10 10:45:08,946][01485] Fps is (10 sec: 4097.0, 60 sec: 4096.2, 300 sec: 3998.9). Total num frames: 1982464. Throughput: 0: 993.9. Samples: 493636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:45:08,950][01485] Avg episode reward: [(0, '17.735')] [2025-01-10 10:45:13,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2002944. Throughput: 0: 1053.5. Samples: 500592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:13,948][01485] Avg episode reward: [(0, '19.610')] [2025-01-10 10:45:13,950][03555] Saving new best policy, reward=19.610! [2025-01-10 10:45:14,822][03568] Updated weights for policy 0, policy_version 490 (0.0016) [2025-01-10 10:45:18,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2019328. Throughput: 0: 996.6. Samples: 504942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:45:18,950][01485] Avg episode reward: [(0, '20.253')] [2025-01-10 10:45:18,958][03555] Saving new best policy, reward=20.253! [2025-01-10 10:45:23,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3998.8). Total num frames: 2043904. Throughput: 0: 994.4. Samples: 508410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:23,952][01485] Avg episode reward: [(0, '20.839')] [2025-01-10 10:45:23,956][03555] Saving new best policy, reward=20.839! [2025-01-10 10:45:24,697][03568] Updated weights for policy 0, policy_version 500 (0.0014) [2025-01-10 10:45:28,946][01485] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2064384. Throughput: 0: 1039.6. Samples: 515554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:28,952][01485] Avg episode reward: [(0, '21.612')] [2025-01-10 10:45:28,974][03555] Saving new best policy, reward=21.612! [2025-01-10 10:45:33,948][01485] Fps is (10 sec: 3685.9, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2080768. Throughput: 0: 1009.4. Samples: 520442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:33,954][01485] Avg episode reward: [(0, '22.753')] [2025-01-10 10:45:33,961][03555] Saving new best policy, reward=22.753! [2025-01-10 10:45:36,022][03568] Updated weights for policy 0, policy_version 510 (0.0045) [2025-01-10 10:45:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3984.9). Total num frames: 2101248. Throughput: 0: 982.0. Samples: 522882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:45:38,948][01485] Avg episode reward: [(0, '20.687')] [2025-01-10 10:45:43,946][01485] Fps is (10 sec: 4506.2, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 2125824. Throughput: 0: 1012.9. Samples: 530294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:43,954][01485] Avg episode reward: [(0, '18.960')] [2025-01-10 10:45:44,413][03568] Updated weights for policy 0, policy_version 520 (0.0026) [2025-01-10 10:45:48,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 2142208. Throughput: 0: 1029.6. Samples: 536320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:45:48,949][01485] Avg episode reward: [(0, '18.524')] [2025-01-10 10:45:53,949][01485] Fps is (10 sec: 3685.5, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 2162688. Throughput: 0: 996.7. Samples: 538492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:53,955][01485] Avg episode reward: [(0, '17.320')] [2025-01-10 10:45:55,621][03568] Updated weights for policy 0, policy_version 530 (0.0024) [2025-01-10 10:45:58,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 4012.7). Total num frames: 2187264. Throughput: 0: 992.5. Samples: 545254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:45:58,952][01485] Avg episode reward: [(0, '16.496')] [2025-01-10 10:46:03,946][01485] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2203648. Throughput: 0: 1044.3. Samples: 551936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:46:03,952][01485] Avg episode reward: [(0, '17.367')] [2025-01-10 10:46:05,256][03568] Updated weights for policy 0, policy_version 540 (0.0030) [2025-01-10 10:46:08,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2220032. Throughput: 0: 1014.5. Samples: 554064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:46:08,949][01485] Avg episode reward: [(0, '16.599')] [2025-01-10 10:46:08,961][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth... [2025-01-10 10:46:09,124][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth [2025-01-10 10:46:13,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.9). Total num frames: 2244608. Throughput: 0: 981.5. Samples: 559722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:46:13,949][01485] Avg episode reward: [(0, '17.966')] [2025-01-10 10:46:15,644][03568] Updated weights for policy 0, policy_version 550 (0.0014) [2025-01-10 10:46:18,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2265088. Throughput: 0: 1026.6. Samples: 566638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:46:18,948][01485] Avg episode reward: [(0, '18.308')] [2025-01-10 10:46:23,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2281472. Throughput: 0: 1033.6. Samples: 569392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:46:23,952][01485] Avg episode reward: [(0, '18.075')] [2025-01-10 10:46:26,955][03568] Updated weights for policy 0, policy_version 560 (0.0030) [2025-01-10 10:46:28,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 2301952. Throughput: 0: 976.4. Samples: 574230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:46:28,949][01485] Avg episode reward: [(0, '18.151')] [2025-01-10 10:46:33,952][01485] Fps is (10 sec: 4503.0, 60 sec: 4095.7, 300 sec: 4012.6). Total num frames: 2326528. Throughput: 0: 1004.5. Samples: 581528. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:46:33,955][01485] Avg episode reward: [(0, '18.499')] [2025-01-10 10:46:35,447][03568] Updated weights for policy 0, policy_version 570 (0.0023) [2025-01-10 10:46:38,947][01485] Fps is (10 sec: 4505.0, 60 sec: 4095.9, 300 sec: 3998.8). Total num frames: 2347008. Throughput: 0: 1037.5. Samples: 585176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:46:38,950][01485] Avg episode reward: [(0, '17.674')] [2025-01-10 10:46:43,948][01485] Fps is (10 sec: 3278.2, 60 sec: 3891.1, 300 sec: 3971.0). Total num frames: 2359296. Throughput: 0: 987.0. Samples: 589672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:46:43,955][01485] Avg episode reward: [(0, '16.524')] [2025-01-10 10:46:46,566][03568] Updated weights for policy 0, policy_version 580 (0.0022) [2025-01-10 10:46:48,946][01485] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2383872. Throughput: 0: 989.7. Samples: 596472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:46:48,948][01485] Avg episode reward: [(0, '16.389')] [2025-01-10 10:46:53,949][01485] Fps is (10 sec: 4914.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2408448. Throughput: 0: 1024.0. Samples: 600146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:46:53,951][01485] Avg episode reward: [(0, '16.626')] [2025-01-10 10:46:55,852][03568] Updated weights for policy 0, policy_version 590 (0.0017) [2025-01-10 10:46:58,951][01485] Fps is (10 sec: 4093.9, 60 sec: 3959.1, 300 sec: 3984.9). Total num frames: 2424832. Throughput: 0: 1019.9. Samples: 605624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:46:58,954][01485] Avg episode reward: [(0, '16.776')] [2025-01-10 10:47:03,946][01485] Fps is (10 sec: 3687.5, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2445312. Throughput: 0: 995.3. Samples: 611428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:47:03,952][01485] Avg episode reward: [(0, '18.891')] [2025-01-10 10:47:05,992][03568] Updated weights for policy 0, policy_version 600 (0.0032) [2025-01-10 10:47:08,946][01485] Fps is (10 sec: 4507.9, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 2469888. Throughput: 0: 1016.1. Samples: 615118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:47:08,953][01485] Avg episode reward: [(0, '20.116')] [2025-01-10 10:47:13,946][01485] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2486272. Throughput: 0: 1049.3. Samples: 621448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:47:13,951][01485] Avg episode reward: [(0, '20.766')] [2025-01-10 10:47:17,195][03568] Updated weights for policy 0, policy_version 610 (0.0017) [2025-01-10 10:47:18,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2502656. Throughput: 0: 996.1. Samples: 626348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:47:18,950][01485] Avg episode reward: [(0, '21.677')] [2025-01-10 10:47:23,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2527232. Throughput: 0: 996.1. Samples: 629998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:47:23,948][01485] Avg episode reward: [(0, '22.510')] [2025-01-10 10:47:25,722][03568] Updated weights for policy 0, policy_version 620 (0.0022) [2025-01-10 10:47:28,949][01485] Fps is (10 sec: 4504.3, 60 sec: 4095.8, 300 sec: 3998.8). Total num frames: 2547712. Throughput: 0: 1055.3. Samples: 637160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:47:28,953][01485] Avg episode reward: [(0, '21.602')] [2025-01-10 10:47:33,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3984.9). Total num frames: 2564096. Throughput: 0: 1001.6. Samples: 641546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:47:33,951][01485] Avg episode reward: [(0, '21.552')] [2025-01-10 10:47:36,871][03568] Updated weights for policy 0, policy_version 630 (0.0024) [2025-01-10 10:47:38,946][01485] Fps is (10 sec: 4097.1, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 2588672. Throughput: 0: 991.9. Samples: 644778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:47:38,949][01485] Avg episode reward: [(0, '21.482')] [2025-01-10 10:47:43,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4012.7). Total num frames: 2609152. Throughput: 0: 1028.8. Samples: 651914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:47:43,949][01485] Avg episode reward: [(0, '21.118')] [2025-01-10 10:47:48,359][03568] Updated weights for policy 0, policy_version 640 (0.0029) [2025-01-10 10:47:48,946][01485] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2621440. Throughput: 0: 985.8. Samples: 655790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:47:48,949][01485] Avg episode reward: [(0, '20.080')] [2025-01-10 10:47:53,946][01485] Fps is (10 sec: 2457.6, 60 sec: 3754.9, 300 sec: 3957.2). Total num frames: 2633728. Throughput: 0: 943.0. Samples: 657554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:47:53,952][01485] Avg episode reward: [(0, '19.686')] [2025-01-10 10:47:58,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3971.0). Total num frames: 2658304. Throughput: 0: 938.1. Samples: 663662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:47:58,951][01485] Avg episode reward: [(0, '20.508')] [2025-01-10 10:47:59,463][03568] Updated weights for policy 0, policy_version 650 (0.0022) [2025-01-10 10:48:03,946][01485] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2682880. Throughput: 0: 986.5. Samples: 670742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:48:03,952][01485] Avg episode reward: [(0, '18.610')] [2025-01-10 10:48:08,946][01485] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 2695168. Throughput: 0: 956.4. Samples: 673034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:48:08,949][01485] Avg episode reward: [(0, '18.328')] [2025-01-10 10:48:08,958][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth... [2025-01-10 10:48:09,132][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth [2025-01-10 10:48:10,779][03568] Updated weights for policy 0, policy_version 660 (0.0030) [2025-01-10 10:48:13,947][01485] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 2715648. Throughput: 0: 910.5. Samples: 678130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:13,950][01485] Avg episode reward: [(0, '19.472')] [2025-01-10 10:48:18,946][01485] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2740224. Throughput: 0: 975.2. Samples: 685432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:18,950][01485] Avg episode reward: [(0, '20.148')] [2025-01-10 10:48:19,193][03568] Updated weights for policy 0, policy_version 670 (0.0013) [2025-01-10 10:48:23,946][01485] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2756608. Throughput: 0: 974.4. Samples: 688624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:48:23,950][01485] Avg episode reward: [(0, '18.662')] [2025-01-10 10:48:28,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3971.2). Total num frames: 2777088. Throughput: 0: 912.1. Samples: 692958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:48:28,951][01485] Avg episode reward: [(0, '19.419')] [2025-01-10 10:48:30,337][03568] Updated weights for policy 0, policy_version 680 (0.0019) [2025-01-10 10:48:33,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2801664. Throughput: 0: 987.4. Samples: 700222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:33,948][01485] Avg episode reward: [(0, '19.430')] [2025-01-10 10:48:38,948][01485] Fps is (10 sec: 4504.6, 60 sec: 3891.1, 300 sec: 3984.9). Total num frames: 2822144. Throughput: 0: 1028.6. Samples: 703844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:38,954][01485] Avg episode reward: [(0, '19.911')] [2025-01-10 10:48:39,970][03568] Updated weights for policy 0, policy_version 690 (0.0030) [2025-01-10 10:48:43,950][01485] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3984.9). Total num frames: 2834432. Throughput: 0: 1002.6. Samples: 708782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:43,956][01485] Avg episode reward: [(0, '19.649')] [2025-01-10 10:48:48,946][01485] Fps is (10 sec: 3687.2, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2859008. Throughput: 0: 990.0. Samples: 715290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:48,952][01485] Avg episode reward: [(0, '20.312')] [2025-01-10 10:48:49,878][03568] Updated weights for policy 0, policy_version 700 (0.0028) [2025-01-10 10:48:53,948][01485] Fps is (10 sec: 4916.3, 60 sec: 4164.2, 300 sec: 4012.7). Total num frames: 2883584. Throughput: 0: 1020.1. Samples: 718938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:53,951][01485] Avg episode reward: [(0, '21.819')] [2025-01-10 10:48:58,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2899968. Throughput: 0: 1035.8. Samples: 724740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:48:58,948][01485] Avg episode reward: [(0, '21.100')] [2025-01-10 10:49:00,913][03568] Updated weights for policy 0, policy_version 710 (0.0035) [2025-01-10 10:49:03,946][01485] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2920448. Throughput: 0: 998.8. Samples: 730378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:49:03,951][01485] Avg episode reward: [(0, '21.268')] [2025-01-10 10:49:08,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2945024. Throughput: 0: 1006.1. Samples: 733900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:49:08,950][01485] Avg episode reward: [(0, '20.760')] [2025-01-10 10:49:09,369][03568] Updated weights for policy 0, policy_version 720 (0.0032) [2025-01-10 10:49:13,948][01485] Fps is (10 sec: 4095.3, 60 sec: 4095.9, 300 sec: 3998.8). Total num frames: 2961408. Throughput: 0: 1060.0. Samples: 740658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:49:13,951][01485] Avg episode reward: [(0, '21.737')] [2025-01-10 10:49:18,946][01485] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2981888. Throughput: 0: 1003.2. Samples: 745368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:49:18,949][01485] Avg episode reward: [(0, '20.980')] [2025-01-10 10:49:20,440][03568] Updated weights for policy 0, policy_version 730 (0.0037) [2025-01-10 10:49:23,946][01485] Fps is (10 sec: 4506.4, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 3006464. Throughput: 0: 1005.7. Samples: 749100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:49:23,950][01485] Avg episode reward: [(0, '20.694')] [2025-01-10 10:49:28,946][01485] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 3026944. Throughput: 0: 1054.6. Samples: 756236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:49:28,953][01485] Avg episode reward: [(0, '19.839')] [2025-01-10 10:49:29,896][03568] Updated weights for policy 0, policy_version 740 (0.0022) [2025-01-10 10:49:33,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3985.0). Total num frames: 3039232. Throughput: 0: 1012.0. Samples: 760830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:49:33,951][01485] Avg episode reward: [(0, '20.041')] [2025-01-10 10:49:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 3063808. Throughput: 0: 993.7. Samples: 763652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:49:38,949][01485] Avg episode reward: [(0, '20.258')] [2025-01-10 10:49:40,232][03568] Updated weights for policy 0, policy_version 750 (0.0021) [2025-01-10 10:49:43,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4232.8, 300 sec: 4026.6). Total num frames: 3088384. Throughput: 0: 1030.4. Samples: 771106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:49:43,952][01485] Avg episode reward: [(0, '19.800')] [2025-01-10 10:49:48,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 3104768. Throughput: 0: 1030.6. Samples: 776756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:49:48,952][01485] Avg episode reward: [(0, '19.415')] [2025-01-10 10:49:50,889][03568] Updated weights for policy 0, policy_version 760 (0.0020) [2025-01-10 10:49:53,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 3125248. Throughput: 0: 1002.8. Samples: 779026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:49:53,948][01485] Avg episode reward: [(0, '20.651')] [2025-01-10 10:49:58,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 3149824. Throughput: 0: 1010.9. Samples: 786146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:49:58,949][01485] Avg episode reward: [(0, '21.080')] [2025-01-10 10:49:59,479][03568] Updated weights for policy 0, policy_version 770 (0.0027) [2025-01-10 10:50:03,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 3170304. Throughput: 0: 1056.1. Samples: 792892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:50:03,950][01485] Avg episode reward: [(0, '20.138')] [2025-01-10 10:50:08,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3186688. Throughput: 0: 1020.2. Samples: 795008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:50:08,948][01485] Avg episode reward: [(0, '21.647')] [2025-01-10 10:50:08,957][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000778_3186688.pth... [2025-01-10 10:50:09,082][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth [2025-01-10 10:50:10,702][03568] Updated weights for policy 0, policy_version 780 (0.0022) [2025-01-10 10:50:13,946][01485] Fps is (10 sec: 3686.3, 60 sec: 4096.1, 300 sec: 4026.6). Total num frames: 3207168. Throughput: 0: 995.6. Samples: 801036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:50:13,949][01485] Avg episode reward: [(0, '21.363')] [2025-01-10 10:50:18,949][01485] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4026.5). Total num frames: 3231744. Throughput: 0: 1057.3. Samples: 808410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:50:18,952][01485] Avg episode reward: [(0, '21.359')] [2025-01-10 10:50:19,213][03568] Updated weights for policy 0, policy_version 790 (0.0021) [2025-01-10 10:50:23,946][01485] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3248128. Throughput: 0: 1050.9. Samples: 810944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:50:23,948][01485] Avg episode reward: [(0, '21.563')] [2025-01-10 10:50:28,946][01485] Fps is (10 sec: 3687.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3268608. Throughput: 0: 997.5. Samples: 815992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:50:28,949][01485] Avg episode reward: [(0, '22.450')] [2025-01-10 10:50:30,251][03568] Updated weights for policy 0, policy_version 800 (0.0024) [2025-01-10 10:50:33,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4040.5). Total num frames: 3293184. Throughput: 0: 1036.6. Samples: 823402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:50:33,950][01485] Avg episode reward: [(0, '22.467')] [2025-01-10 10:50:38,950][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3309568. Throughput: 0: 1059.8. Samples: 826716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:50:38,953][01485] Avg episode reward: [(0, '23.987')] [2025-01-10 10:50:38,963][03555] Saving new best policy, reward=23.987! [2025-01-10 10:50:40,491][03568] Updated weights for policy 0, policy_version 810 (0.0021) [2025-01-10 10:50:43,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3330048. Throughput: 0: 997.1. Samples: 831016. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:50:43,950][01485] Avg episode reward: [(0, '24.914')] [2025-01-10 10:50:43,953][03555] Saving new best policy, reward=24.914! [2025-01-10 10:50:48,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3350528. Throughput: 0: 1003.3. Samples: 838040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:50:48,953][01485] Avg episode reward: [(0, '25.467')] [2025-01-10 10:50:48,967][03555] Saving new best policy, reward=25.467! [2025-01-10 10:50:50,002][03568] Updated weights for policy 0, policy_version 820 (0.0024) [2025-01-10 10:50:53,947][01485] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 4026.6). Total num frames: 3375104. Throughput: 0: 1035.0. Samples: 841582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:50:53,950][01485] Avg episode reward: [(0, '25.801')] [2025-01-10 10:50:53,958][03555] Saving new best policy, reward=25.801! [2025-01-10 10:50:58,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3387392. Throughput: 0: 1014.3. Samples: 846678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:50:58,949][01485] Avg episode reward: [(0, '25.257')] [2025-01-10 10:51:00,970][03568] Updated weights for policy 0, policy_version 830 (0.0032) [2025-01-10 10:51:03,946][01485] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3411968. Throughput: 0: 990.3. Samples: 852972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:51:03,951][01485] Avg episode reward: [(0, '24.214')] [2025-01-10 10:51:08,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 3436544. Throughput: 0: 1014.0. Samples: 856576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:51:08,953][01485] Avg episode reward: [(0, '22.857')] [2025-01-10 10:51:09,616][03568] Updated weights for policy 0, policy_version 840 (0.0031) [2025-01-10 10:51:13,946][01485] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3452928. Throughput: 0: 1034.4. Samples: 862542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:51:13,951][01485] Avg episode reward: [(0, '22.440')] [2025-01-10 10:51:18,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 4026.6). Total num frames: 3469312. Throughput: 0: 988.8. Samples: 867898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:51:18,948][01485] Avg episode reward: [(0, '22.485')] [2025-01-10 10:51:20,580][03568] Updated weights for policy 0, policy_version 850 (0.0028) [2025-01-10 10:51:23,946][01485] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3493888. Throughput: 0: 997.6. Samples: 871606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:51:23,953][01485] Avg episode reward: [(0, '21.824')] [2025-01-10 10:51:28,948][01485] Fps is (10 sec: 4505.0, 60 sec: 4095.9, 300 sec: 4026.6). Total num frames: 3514368. Throughput: 0: 1054.1. Samples: 878450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:51:28,956][01485] Avg episode reward: [(0, '22.626')] [2025-01-10 10:51:30,928][03568] Updated weights for policy 0, policy_version 860 (0.0023) [2025-01-10 10:51:33,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3530752. Throughput: 0: 1000.0. Samples: 883042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:51:33,948][01485] Avg episode reward: [(0, '22.526')] [2025-01-10 10:51:38,946][01485] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 3555328. Throughput: 0: 999.2. Samples: 886546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:51:38,951][01485] Avg episode reward: [(0, '22.394')] [2025-01-10 10:51:40,210][03568] Updated weights for policy 0, policy_version 870 (0.0018) [2025-01-10 10:51:43,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3571712. Throughput: 0: 1028.0. Samples: 892936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:51:43,954][01485] Avg episode reward: [(0, '22.587')] [2025-01-10 10:51:48,946][01485] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3985.0). Total num frames: 3584000. Throughput: 0: 969.7. Samples: 896610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:51:48,949][01485] Avg episode reward: [(0, '22.756')] [2025-01-10 10:51:53,824][03568] Updated weights for policy 0, policy_version 880 (0.0029) [2025-01-10 10:51:53,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3998.9). Total num frames: 3604480. Throughput: 0: 936.9. Samples: 898736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:51:53,953][01485] Avg episode reward: [(0, '21.937')] [2025-01-10 10:51:58,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3629056. Throughput: 0: 960.4. Samples: 905760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:51:58,953][01485] Avg episode reward: [(0, '21.809')] [2025-01-10 10:52:02,379][03568] Updated weights for policy 0, policy_version 890 (0.0023) [2025-01-10 10:52:03,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3649536. Throughput: 0: 991.0. Samples: 912492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:52:03,953][01485] Avg episode reward: [(0, '21.904')] [2025-01-10 10:52:08,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3984.9). Total num frames: 3661824. Throughput: 0: 954.3. Samples: 914548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:52:08,950][01485] Avg episode reward: [(0, '21.581')] [2025-01-10 10:52:08,963][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth... [2025-01-10 10:52:09,113][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth [2025-01-10 10:52:13,813][03568] Updated weights for policy 0, policy_version 900 (0.0017) [2025-01-10 10:52:13,948][01485] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 4012.7). Total num frames: 3686400. Throughput: 0: 929.7. Samples: 920286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-10 10:52:13,955][01485] Avg episode reward: [(0, '21.877')] [2025-01-10 10:52:18,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3710976. Throughput: 0: 990.8. Samples: 927630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-10 10:52:18,951][01485] Avg episode reward: [(0, '22.557')] [2025-01-10 10:52:23,946][01485] Fps is (10 sec: 3687.1, 60 sec: 3822.9, 300 sec: 3985.0). Total num frames: 3723264. Throughput: 0: 975.2. Samples: 930428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:52:23,948][01485] Avg episode reward: [(0, '23.490')] [2025-01-10 10:52:24,152][03568] Updated weights for policy 0, policy_version 910 (0.0014) [2025-01-10 10:52:28,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3998.8). Total num frames: 3743744. Throughput: 0: 943.3. Samples: 935384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-10 10:52:28,953][01485] Avg episode reward: [(0, '23.848')] [2025-01-10 10:52:33,138][03568] Updated weights for policy 0, policy_version 920 (0.0030) [2025-01-10 10:52:33,952][01485] Fps is (10 sec: 4912.3, 60 sec: 4027.3, 300 sec: 4012.6). Total num frames: 3772416. Throughput: 0: 1026.4. Samples: 942804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:52:33,959][01485] Avg episode reward: [(0, '25.474')] [2025-01-10 10:52:38,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 3788800. Throughput: 0: 1056.5. Samples: 946280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-10 10:52:38,953][01485] Avg episode reward: [(0, '25.169')] [2025-01-10 10:52:43,946][01485] Fps is (10 sec: 3278.7, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 3805184. Throughput: 0: 999.6. Samples: 950742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-10 10:52:43,951][01485] Avg episode reward: [(0, '26.391')] [2025-01-10 10:52:43,956][03555] Saving new best policy, reward=26.391! [2025-01-10 10:52:44,459][03568] Updated weights for policy 0, policy_version 930 (0.0037) [2025-01-10 10:52:48,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3829760. Throughput: 0: 1000.9. Samples: 957534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-10 10:52:48,953][01485] Avg episode reward: [(0, '25.127')] [2025-01-10 10:52:52,651][03568] Updated weights for policy 0, policy_version 940 (0.0017) [2025-01-10 10:52:53,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3854336. Throughput: 0: 1036.2. Samples: 961176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:52:53,948][01485] Avg episode reward: [(0, '23.362')] [2025-01-10 10:52:58,951][01485] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 4012.6). Total num frames: 3866624. Throughput: 0: 1026.5. Samples: 966482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:52:58,953][01485] Avg episode reward: [(0, '22.392')] [2025-01-10 10:53:03,796][03568] Updated weights for policy 0, policy_version 950 (0.0022) [2025-01-10 10:53:03,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3891200. Throughput: 0: 996.2. Samples: 972458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:53:03,951][01485] Avg episode reward: [(0, '22.347')] [2025-01-10 10:53:08,946][01485] Fps is (10 sec: 4507.4, 60 sec: 4164.3, 300 sec: 4054.4). Total num frames: 3911680. Throughput: 0: 1012.7. Samples: 976000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:53:08,949][01485] Avg episode reward: [(0, '22.070')] [2025-01-10 10:53:13,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 3928064. Throughput: 0: 1040.6. Samples: 982210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:53:13,948][01485] Avg episode reward: [(0, '21.924')] [2025-01-10 10:53:14,078][03568] Updated weights for policy 0, policy_version 960 (0.0032) [2025-01-10 10:53:18,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3948544. Throughput: 0: 988.8. Samples: 987294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:53:18,954][01485] Avg episode reward: [(0, '21.808')] [2025-01-10 10:53:23,466][03568] Updated weights for policy 0, policy_version 970 (0.0040) [2025-01-10 10:53:23,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3973120. Throughput: 0: 992.0. Samples: 990918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:53:23,949][01485] Avg episode reward: [(0, '21.168')] [2025-01-10 10:53:28,947][01485] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 4040.5). Total num frames: 3993600. Throughput: 0: 1055.6. Samples: 998246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:53:28,952][01485] Avg episode reward: [(0, '22.582')] [2025-01-10 10:53:33,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 4026.6). Total num frames: 4009984. Throughput: 0: 1004.7. Samples: 1002746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:53:33,958][01485] Avg episode reward: [(0, '23.324')] [2025-01-10 10:53:34,529][03568] Updated weights for policy 0, policy_version 980 (0.0045) [2025-01-10 10:53:38,946][01485] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4068.3). Total num frames: 4034560. Throughput: 0: 994.1. Samples: 1005912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:53:38,948][01485] Avg episode reward: [(0, '24.053')] [2025-01-10 10:53:43,017][03568] Updated weights for policy 0, policy_version 990 (0.0030) [2025-01-10 10:53:43,950][01485] Fps is (10 sec: 4913.5, 60 sec: 4232.3, 300 sec: 4068.2). Total num frames: 4059136. Throughput: 0: 1040.4. Samples: 1013298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:53:43,952][01485] Avg episode reward: [(0, '25.369')] [2025-01-10 10:53:48,949][01485] Fps is (10 sec: 3685.5, 60 sec: 4027.6, 300 sec: 4026.6). Total num frames: 4071424. Throughput: 0: 1024.9. Samples: 1018582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:53:48,951][01485] Avg episode reward: [(0, '25.411')] [2025-01-10 10:53:53,946][01485] Fps is (10 sec: 3277.9, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 4091904. Throughput: 0: 999.2. Samples: 1020964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:53:53,953][01485] Avg episode reward: [(0, '24.748')] [2025-01-10 10:53:54,181][03568] Updated weights for policy 0, policy_version 1000 (0.0029) [2025-01-10 10:53:58,946][01485] Fps is (10 sec: 4506.7, 60 sec: 4164.5, 300 sec: 4054.3). Total num frames: 4116480. Throughput: 0: 1025.0. Samples: 1028336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-10 10:53:58,948][01485] Avg episode reward: [(0, '24.220')] [2025-01-10 10:54:03,434][03568] Updated weights for policy 0, policy_version 1010 (0.0022) [2025-01-10 10:54:03,947][01485] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 4136960. Throughput: 0: 1049.1. Samples: 1034504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:54:03,952][01485] Avg episode reward: [(0, '23.380')] [2025-01-10 10:54:08,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 4153344. Throughput: 0: 1018.2. Samples: 1036736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:54:08,949][01485] Avg episode reward: [(0, '21.756')] [2025-01-10 10:54:08,965][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001014_4153344.pth... [2025-01-10 10:54:09,096][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000778_3186688.pth [2025-01-10 10:54:13,728][03568] Updated weights for policy 0, policy_version 1020 (0.0042) [2025-01-10 10:54:13,949][01485] Fps is (10 sec: 4095.1, 60 sec: 4164.1, 300 sec: 4054.3). Total num frames: 4177920. Throughput: 0: 998.7. Samples: 1043188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:54:13,956][01485] Avg episode reward: [(0, '21.539')] [2025-01-10 10:54:18,948][01485] Fps is (10 sec: 4504.9, 60 sec: 4164.2, 300 sec: 4040.4). Total num frames: 4198400. Throughput: 0: 1057.1. Samples: 1050316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:54:18,954][01485] Avg episode reward: [(0, '20.635')] [2025-01-10 10:54:23,947][01485] Fps is (10 sec: 3687.2, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 4214784. Throughput: 0: 1035.2. Samples: 1052498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-10 10:54:23,955][01485] Avg episode reward: [(0, '21.787')] [2025-01-10 10:54:24,954][03568] Updated weights for policy 0, policy_version 1030 (0.0018) [2025-01-10 10:54:28,949][01485] Fps is (10 sec: 3686.0, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 4235264. Throughput: 0: 997.4. Samples: 1058182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:54:28,954][01485] Avg episode reward: [(0, '23.320')] [2025-01-10 10:54:33,114][03568] Updated weights for policy 0, policy_version 1040 (0.0027) [2025-01-10 10:54:33,946][01485] Fps is (10 sec: 4915.4, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 4263936. Throughput: 0: 1042.5. Samples: 1065492. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:54:33,952][01485] Avg episode reward: [(0, '23.277')] [2025-01-10 10:54:38,946][01485] Fps is (10 sec: 4097.1, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 4276224. Throughput: 0: 1054.2. Samples: 1068404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:54:38,949][01485] Avg episode reward: [(0, '24.557')] [2025-01-10 10:54:43,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 4040.5). Total num frames: 4296704. Throughput: 0: 993.8. Samples: 1073058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:54:43,954][01485] Avg episode reward: [(0, '24.614')] [2025-01-10 10:54:44,358][03568] Updated weights for policy 0, policy_version 1050 (0.0017) [2025-01-10 10:54:48,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4054.3). Total num frames: 4321280. Throughput: 0: 1019.3. Samples: 1080374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:54:48,953][01485] Avg episode reward: [(0, '24.135')] [2025-01-10 10:54:53,428][03568] Updated weights for policy 0, policy_version 1060 (0.0036) [2025-01-10 10:54:53,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 4341760. Throughput: 0: 1050.7. Samples: 1084016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:54:53,951][01485] Avg episode reward: [(0, '23.533')] [2025-01-10 10:54:58,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 4358144. Throughput: 0: 1009.2. Samples: 1088598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:54:58,952][01485] Avg episode reward: [(0, '23.410')] [2025-01-10 10:55:03,928][03568] Updated weights for policy 0, policy_version 1070 (0.0020) [2025-01-10 10:55:03,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 4382720. Throughput: 0: 1001.9. Samples: 1095398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:55:03,954][01485] Avg episode reward: [(0, '20.920')] [2025-01-10 10:55:08,949][01485] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4054.3). Total num frames: 4403200. Throughput: 0: 1035.2. Samples: 1099082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:55:08,962][01485] Avg episode reward: [(0, '20.957')] [2025-01-10 10:55:13,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 4419584. Throughput: 0: 1025.1. Samples: 1104308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:55:13,954][01485] Avg episode reward: [(0, '21.488')] [2025-01-10 10:55:14,941][03568] Updated weights for policy 0, policy_version 1080 (0.0038) [2025-01-10 10:55:18,946][01485] Fps is (10 sec: 3687.3, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 4440064. Throughput: 0: 995.1. Samples: 1110270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:55:18,958][01485] Avg episode reward: [(0, '21.399')] [2025-01-10 10:55:23,389][03568] Updated weights for policy 0, policy_version 1090 (0.0025) [2025-01-10 10:55:23,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 4464640. Throughput: 0: 1011.2. Samples: 1113910. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:55:23,953][01485] Avg episode reward: [(0, '22.485')] [2025-01-10 10:55:28,954][01485] Fps is (10 sec: 4093.0, 60 sec: 4095.7, 300 sec: 4026.5). Total num frames: 4481024. Throughput: 0: 1047.2. Samples: 1120188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:55:28,956][01485] Avg episode reward: [(0, '23.049')] [2025-01-10 10:55:33,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 4501504. Throughput: 0: 997.7. Samples: 1125270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:55:33,949][01485] Avg episode reward: [(0, '24.033')] [2025-01-10 10:55:34,603][03568] Updated weights for policy 0, policy_version 1100 (0.0019) [2025-01-10 10:55:38,946][01485] Fps is (10 sec: 3689.1, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 4517888. Throughput: 0: 994.8. Samples: 1128782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-10 10:55:38,952][01485] Avg episode reward: [(0, '23.297')] [2025-01-10 10:55:43,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 4534272. Throughput: 0: 989.8. Samples: 1133138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:55:43,950][01485] Avg episode reward: [(0, '23.616')] [2025-01-10 10:55:47,873][03568] Updated weights for policy 0, policy_version 1110 (0.0017) [2025-01-10 10:55:48,946][01485] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3971.0). Total num frames: 4546560. Throughput: 0: 932.5. Samples: 1137360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:55:48,949][01485] Avg episode reward: [(0, '22.989')] [2025-01-10 10:55:53,949][01485] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 4012.6). Total num frames: 4571136. Throughput: 0: 919.8. Samples: 1140474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:55:53,955][01485] Avg episode reward: [(0, '24.035')] [2025-01-10 10:55:56,706][03568] Updated weights for policy 0, policy_version 1120 (0.0016) [2025-01-10 10:55:58,946][01485] Fps is (10 sec: 4915.1, 60 sec: 3959.4, 300 sec: 4012.7). Total num frames: 4595712. Throughput: 0: 969.1. Samples: 1147920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:55:58,951][01485] Avg episode reward: [(0, '24.396')] [2025-01-10 10:56:03,946][01485] Fps is (10 sec: 4097.4, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 4612096. Throughput: 0: 958.0. Samples: 1153380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:56:03,953][01485] Avg episode reward: [(0, '24.282')] [2025-01-10 10:56:07,836][03568] Updated weights for policy 0, policy_version 1130 (0.0017) [2025-01-10 10:56:08,946][01485] Fps is (10 sec: 3686.5, 60 sec: 3823.1, 300 sec: 3998.8). Total num frames: 4632576. Throughput: 0: 926.6. Samples: 1155606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:56:08,948][01485] Avg episode reward: [(0, '23.033')] [2025-01-10 10:56:08,962][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001131_4632576.pth... [2025-01-10 10:56:09,084][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth [2025-01-10 10:56:13,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 4657152. Throughput: 0: 946.9. Samples: 1162790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:56:13,948][01485] Avg episode reward: [(0, '21.431')] [2025-01-10 10:56:16,575][03568] Updated weights for policy 0, policy_version 1140 (0.0017) [2025-01-10 10:56:18,946][01485] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 4673536. Throughput: 0: 976.5. Samples: 1169212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:56:18,955][01485] Avg episode reward: [(0, '20.384')] [2025-01-10 10:56:23,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3984.9). Total num frames: 4689920. Throughput: 0: 947.6. Samples: 1171424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:56:23,952][01485] Avg episode reward: [(0, '20.923')] [2025-01-10 10:56:27,293][03568] Updated weights for policy 0, policy_version 1150 (0.0017) [2025-01-10 10:56:28,946][01485] Fps is (10 sec: 4096.0, 60 sec: 3891.7, 300 sec: 4012.7). Total num frames: 4714496. Throughput: 0: 994.1. Samples: 1177872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:56:28,952][01485] Avg episode reward: [(0, '23.014')] [2025-01-10 10:56:33,954][01485] Fps is (10 sec: 4911.4, 60 sec: 3959.0, 300 sec: 4012.6). Total num frames: 4739072. Throughput: 0: 1062.8. Samples: 1185192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:56:33,963][01485] Avg episode reward: [(0, '23.432')] [2025-01-10 10:56:37,661][03568] Updated weights for policy 0, policy_version 1160 (0.0020) [2025-01-10 10:56:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 4751360. Throughput: 0: 1042.2. Samples: 1187370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:56:38,950][01485] Avg episode reward: [(0, '23.245')] [2025-01-10 10:56:43,946][01485] Fps is (10 sec: 3689.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 4775936. Throughput: 0: 998.3. Samples: 1192844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:56:43,950][01485] Avg episode reward: [(0, '23.445')] [2025-01-10 10:56:46,848][03568] Updated weights for policy 0, policy_version 1170 (0.0020) [2025-01-10 10:56:48,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4054.3). Total num frames: 4800512. Throughput: 0: 1042.8. Samples: 1200304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:56:48,950][01485] Avg episode reward: [(0, '22.464')] [2025-01-10 10:56:53,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4026.6). Total num frames: 4816896. Throughput: 0: 1061.1. Samples: 1203356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:56:53,951][01485] Avg episode reward: [(0, '21.588')] [2025-01-10 10:56:57,886][03568] Updated weights for policy 0, policy_version 1180 (0.0020) [2025-01-10 10:56:58,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 4837376. Throughput: 0: 1005.1. Samples: 1208020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:56:58,948][01485] Avg episode reward: [(0, '22.740')] [2025-01-10 10:57:03,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 4861952. Throughput: 0: 1024.9. Samples: 1215332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:57:03,948][01485] Avg episode reward: [(0, '22.272')] [2025-01-10 10:57:06,134][03568] Updated weights for policy 0, policy_version 1190 (0.0021) [2025-01-10 10:57:08,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.4). Total num frames: 4882432. Throughput: 0: 1056.2. Samples: 1218952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 10:57:08,950][01485] Avg episode reward: [(0, '22.970')] [2025-01-10 10:57:13,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 4894720. Throughput: 0: 1017.0. Samples: 1223636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:57:13,949][01485] Avg episode reward: [(0, '23.348')] [2025-01-10 10:57:17,501][03568] Updated weights for policy 0, policy_version 1200 (0.0030) [2025-01-10 10:57:18,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 4919296. Throughput: 0: 999.2. Samples: 1230148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:57:18,953][01485] Avg episode reward: [(0, '23.531')] [2025-01-10 10:57:23,948][01485] Fps is (10 sec: 4914.5, 60 sec: 4232.4, 300 sec: 4068.2). Total num frames: 4943872. Throughput: 0: 1032.4. Samples: 1233828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:57:23,950][01485] Avg episode reward: [(0, '23.288')] [2025-01-10 10:57:26,960][03568] Updated weights for policy 0, policy_version 1210 (0.0027) [2025-01-10 10:57:28,947][01485] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 4960256. Throughput: 0: 1037.3. Samples: 1239522. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-01-10 10:57:28,950][01485] Avg episode reward: [(0, '22.993')] [2025-01-10 10:57:33,950][01485] Fps is (10 sec: 3685.7, 60 sec: 4028.0, 300 sec: 4040.4). Total num frames: 4980736. Throughput: 0: 998.9. Samples: 1245260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 10:57:33,956][01485] Avg episode reward: [(0, '22.579')] [2025-01-10 10:57:36,797][03568] Updated weights for policy 0, policy_version 1220 (0.0023) [2025-01-10 10:57:38,946][01485] Fps is (10 sec: 4505.8, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 5005312. Throughput: 0: 1013.2. Samples: 1248952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:57:38,949][01485] Avg episode reward: [(0, '22.075')] [2025-01-10 10:57:43,947][01485] Fps is (10 sec: 4097.1, 60 sec: 4095.9, 300 sec: 4040.4). Total num frames: 5021696. Throughput: 0: 1055.0. Samples: 1255494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:57:43,952][01485] Avg episode reward: [(0, '21.627')] [2025-01-10 10:57:48,106][03568] Updated weights for policy 0, policy_version 1230 (0.0018) [2025-01-10 10:57:48,946][01485] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 5042176. Throughput: 0: 996.8. Samples: 1260186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:57:48,948][01485] Avg episode reward: [(0, '21.476')] [2025-01-10 10:57:53,949][01485] Fps is (10 sec: 4095.1, 60 sec: 4095.8, 300 sec: 4054.4). Total num frames: 5062656. Throughput: 0: 996.5. Samples: 1263796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:57:53,951][01485] Avg episode reward: [(0, '20.960')] [2025-01-10 10:57:56,457][03568] Updated weights for policy 0, policy_version 1240 (0.0023) [2025-01-10 10:57:58,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 5087232. Throughput: 0: 1058.3. Samples: 1271258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:57:58,950][01485] Avg episode reward: [(0, '22.189')] [2025-01-10 10:58:03,948][01485] Fps is (10 sec: 4096.6, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 5103616. Throughput: 0: 1018.4. Samples: 1275976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:58:03,950][01485] Avg episode reward: [(0, '21.637')] [2025-01-10 10:58:07,391][03568] Updated weights for policy 0, policy_version 1250 (0.0036) [2025-01-10 10:58:08,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 5124096. Throughput: 0: 1005.0. Samples: 1279052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:58:08,950][01485] Avg episode reward: [(0, '21.207')] [2025-01-10 10:58:08,963][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001251_5124096.pth... [2025-01-10 10:58:09,080][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001014_4153344.pth [2025-01-10 10:58:13,955][01485] Fps is (10 sec: 4502.2, 60 sec: 4231.9, 300 sec: 4068.1). Total num frames: 5148672. Throughput: 0: 1039.1. Samples: 1286290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:58:13,964][01485] Avg episode reward: [(0, '21.067')] [2025-01-10 10:58:16,994][03568] Updated weights for policy 0, policy_version 1260 (0.0028) [2025-01-10 10:58:18,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 5165056. Throughput: 0: 1033.3. Samples: 1291754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:58:18,949][01485] Avg episode reward: [(0, '20.832')] [2025-01-10 10:58:23,946][01485] Fps is (10 sec: 3689.9, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 5185536. Throughput: 0: 1000.8. Samples: 1293986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:58:23,952][01485] Avg episode reward: [(0, '20.672')] [2025-01-10 10:58:27,159][03568] Updated weights for policy 0, policy_version 1270 (0.0033) [2025-01-10 10:58:28,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 5210112. Throughput: 0: 1015.0. Samples: 1301170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-10 10:58:28,949][01485] Avg episode reward: [(0, '21.476')] [2025-01-10 10:58:33,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.5, 300 sec: 4054.3). Total num frames: 5230592. Throughput: 0: 1056.3. Samples: 1307718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:58:33,954][01485] Avg episode reward: [(0, '22.912')] [2025-01-10 10:58:37,939][03568] Updated weights for policy 0, policy_version 1280 (0.0015) [2025-01-10 10:58:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 5246976. Throughput: 0: 1025.2. Samples: 1309928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:58:38,951][01485] Avg episode reward: [(0, '22.942')] [2025-01-10 10:58:43,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4054.4). Total num frames: 5267456. Throughput: 0: 998.4. Samples: 1316188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:58:43,948][01485] Avg episode reward: [(0, '22.977')] [2025-01-10 10:58:46,493][03568] Updated weights for policy 0, policy_version 1290 (0.0023) [2025-01-10 10:58:48,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 5292032. Throughput: 0: 1056.7. Samples: 1323526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:58:48,951][01485] Avg episode reward: [(0, '23.782')] [2025-01-10 10:58:53,947][01485] Fps is (10 sec: 4095.5, 60 sec: 4096.1, 300 sec: 4040.4). Total num frames: 5308416. Throughput: 0: 1036.0. Samples: 1325672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:58:53,953][01485] Avg episode reward: [(0, '23.976')] [2025-01-10 10:58:57,630][03568] Updated weights for policy 0, policy_version 1300 (0.0027) [2025-01-10 10:58:58,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 5328896. Throughput: 0: 997.2. Samples: 1331156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:58:58,954][01485] Avg episode reward: [(0, '24.368')] [2025-01-10 10:59:03,946][01485] Fps is (10 sec: 4506.1, 60 sec: 4164.4, 300 sec: 4068.2). Total num frames: 5353472. Throughput: 0: 1039.5. Samples: 1338530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:59:03,949][01485] Avg episode reward: [(0, '24.651')] [2025-01-10 10:59:06,595][03568] Updated weights for policy 0, policy_version 1310 (0.0016) [2025-01-10 10:59:08,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 5369856. Throughput: 0: 1058.5. Samples: 1341620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:59:08,949][01485] Avg episode reward: [(0, '25.038')] [2025-01-10 10:59:13,946][01485] Fps is (10 sec: 3686.5, 60 sec: 4028.4, 300 sec: 4040.5). Total num frames: 5390336. Throughput: 0: 998.0. Samples: 1346080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:59:13,949][01485] Avg episode reward: [(0, '24.780')] [2025-01-10 10:59:17,146][03568] Updated weights for policy 0, policy_version 1320 (0.0029) [2025-01-10 10:59:18,946][01485] Fps is (10 sec: 4505.5, 60 sec: 4164.2, 300 sec: 4068.2). Total num frames: 5414912. Throughput: 0: 1016.3. Samples: 1353450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 10:59:18,949][01485] Avg episode reward: [(0, '22.955')] [2025-01-10 10:59:23,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.3). Total num frames: 5435392. Throughput: 0: 1048.4. Samples: 1357106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:59:23,952][01485] Avg episode reward: [(0, '22.247')] [2025-01-10 10:59:28,053][03568] Updated weights for policy 0, policy_version 1330 (0.0044) [2025-01-10 10:59:28,946][01485] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 5447680. Throughput: 0: 1015.2. Samples: 1361870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:59:28,948][01485] Avg episode reward: [(0, '22.045')] [2025-01-10 10:59:33,947][01485] Fps is (10 sec: 3686.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 5472256. Throughput: 0: 999.4. Samples: 1368502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:59:33,950][01485] Avg episode reward: [(0, '23.468')] [2025-01-10 10:59:36,955][03568] Updated weights for policy 0, policy_version 1340 (0.0015) [2025-01-10 10:59:38,948][01485] Fps is (10 sec: 4504.6, 60 sec: 4095.8, 300 sec: 4054.3). Total num frames: 5492736. Throughput: 0: 1030.5. Samples: 1372044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 10:59:38,951][01485] Avg episode reward: [(0, '24.066')] [2025-01-10 10:59:43,946][01485] Fps is (10 sec: 3277.2, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 5505024. Throughput: 0: 997.2. Samples: 1376030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:59:43,953][01485] Avg episode reward: [(0, '24.380')] [2025-01-10 10:59:48,946][01485] Fps is (10 sec: 2867.8, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 5521408. Throughput: 0: 926.4. Samples: 1380218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 10:59:48,949][01485] Avg episode reward: [(0, '25.164')] [2025-01-10 10:59:50,291][03568] Updated weights for policy 0, policy_version 1350 (0.0033) [2025-01-10 10:59:53,946][01485] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 4026.6). Total num frames: 5545984. Throughput: 0: 938.8. Samples: 1383866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 10:59:53,951][01485] Avg episode reward: [(0, '24.855')] [2025-01-10 10:59:58,949][01485] Fps is (10 sec: 4504.3, 60 sec: 3959.3, 300 sec: 4012.6). Total num frames: 5566464. Throughput: 0: 1003.0. Samples: 1391220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 10:59:58,951][01485] Avg episode reward: [(0, '24.496')] [2025-01-10 10:59:59,368][03568] Updated weights for policy 0, policy_version 1360 (0.0018) [2025-01-10 11:00:03,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 5582848. Throughput: 0: 945.4. Samples: 1395994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:00:03,948][01485] Avg episode reward: [(0, '24.438')] [2025-01-10 11:00:08,946][01485] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 5603328. Throughput: 0: 929.4. Samples: 1398928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:00:08,948][01485] Avg episode reward: [(0, '22.826')] [2025-01-10 11:00:08,982][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001369_5607424.pth... [2025-01-10 11:00:09,106][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001131_4632576.pth [2025-01-10 11:00:09,835][03568] Updated weights for policy 0, policy_version 1370 (0.0023) [2025-01-10 11:00:13,948][01485] Fps is (10 sec: 4504.6, 60 sec: 3959.3, 300 sec: 4026.5). Total num frames: 5627904. Throughput: 0: 984.1. Samples: 1406156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:00:13,950][01485] Avg episode reward: [(0, '22.520')] [2025-01-10 11:00:18,952][01485] Fps is (10 sec: 4093.7, 60 sec: 3822.6, 300 sec: 3998.7). Total num frames: 5644288. Throughput: 0: 962.4. Samples: 1411814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:00:18,956][01485] Avg episode reward: [(0, '23.072')] [2025-01-10 11:00:20,510][03568] Updated weights for policy 0, policy_version 1380 (0.0021) [2025-01-10 11:00:23,946][01485] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 4012.8). Total num frames: 5664768. Throughput: 0: 933.7. Samples: 1414060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:00:23,949][01485] Avg episode reward: [(0, '23.692')] [2025-01-10 11:00:28,946][01485] Fps is (10 sec: 4508.1, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 5689344. Throughput: 0: 1003.0. Samples: 1421166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:00:28,949][01485] Avg episode reward: [(0, '24.755')] [2025-01-10 11:00:29,488][03568] Updated weights for policy 0, policy_version 1390 (0.0018) [2025-01-10 11:00:33,946][01485] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 5709824. Throughput: 0: 1059.5. Samples: 1427894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:00:33,951][01485] Avg episode reward: [(0, '24.122')] [2025-01-10 11:00:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 4040.5). Total num frames: 5726208. Throughput: 0: 1026.4. Samples: 1430056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:00:38,948][01485] Avg episode reward: [(0, '24.290')] [2025-01-10 11:00:40,456][03568] Updated weights for policy 0, policy_version 1400 (0.0024) [2025-01-10 11:00:43,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 5750784. Throughput: 0: 1001.8. Samples: 1436296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:00:43,949][01485] Avg episode reward: [(0, '25.121')] [2025-01-10 11:00:48,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.3). Total num frames: 5771264. Throughput: 0: 1059.1. Samples: 1443654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:00:48,949][01485] Avg episode reward: [(0, '23.514')] [2025-01-10 11:00:49,149][03568] Updated weights for policy 0, policy_version 1410 (0.0016) [2025-01-10 11:00:53,946][01485] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 5787648. Throughput: 0: 1046.0. Samples: 1445998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:00:53,950][01485] Avg episode reward: [(0, '22.803')] [2025-01-10 11:00:58,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 4054.3). Total num frames: 5808128. Throughput: 0: 1007.0. Samples: 1451468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:00:58,951][01485] Avg episode reward: [(0, '23.458')] [2025-01-10 11:00:59,871][03568] Updated weights for policy 0, policy_version 1420 (0.0028) [2025-01-10 11:01:03,946][01485] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 5832704. Throughput: 0: 1044.6. Samples: 1458816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:01:03,951][01485] Avg episode reward: [(0, '23.915')] [2025-01-10 11:01:08,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 5849088. Throughput: 0: 1064.0. Samples: 1461942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:01:08,953][01485] Avg episode reward: [(0, '23.365')] [2025-01-10 11:01:10,306][03568] Updated weights for policy 0, policy_version 1430 (0.0016) [2025-01-10 11:01:13,946][01485] Fps is (10 sec: 3686.5, 60 sec: 4027.9, 300 sec: 4054.3). Total num frames: 5869568. Throughput: 0: 1003.4. Samples: 1466320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:01:13,955][01485] Avg episode reward: [(0, '23.065')] [2025-01-10 11:01:18,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 4082.1). Total num frames: 5894144. Throughput: 0: 1016.8. Samples: 1473648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:01:18,950][01485] Avg episode reward: [(0, '22.953')] [2025-01-10 11:01:19,486][03568] Updated weights for policy 0, policy_version 1440 (0.0027) [2025-01-10 11:01:23,949][01485] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 5914624. Throughput: 0: 1049.4. Samples: 1477282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:01:23,952][01485] Avg episode reward: [(0, '23.046')] [2025-01-10 11:01:28,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.6). Total num frames: 5931008. Throughput: 0: 1020.3. Samples: 1482208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:01:28,953][01485] Avg episode reward: [(0, '21.197')] [2025-01-10 11:01:30,462][03568] Updated weights for policy 0, policy_version 1450 (0.0016) [2025-01-10 11:01:33,946][01485] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 5955584. Throughput: 0: 1002.4. Samples: 1488764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:01:33,949][01485] Avg episode reward: [(0, '21.446')] [2025-01-10 11:01:38,758][03568] Updated weights for policy 0, policy_version 1460 (0.0017) [2025-01-10 11:01:38,949][01485] Fps is (10 sec: 4914.0, 60 sec: 4232.4, 300 sec: 4082.1). Total num frames: 5980160. Throughput: 0: 1032.2. Samples: 1492448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:01:38,953][01485] Avg episode reward: [(0, '22.265')] [2025-01-10 11:01:43,954][01485] Fps is (10 sec: 3683.7, 60 sec: 4027.2, 300 sec: 4040.4). Total num frames: 5992448. Throughput: 0: 1037.3. Samples: 1498152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:01:43,960][01485] Avg episode reward: [(0, '23.285')] [2025-01-10 11:01:48,950][01485] Fps is (10 sec: 3276.5, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 6012928. Throughput: 0: 998.0. Samples: 1503728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:01:48,957][01485] Avg episode reward: [(0, '23.989')] [2025-01-10 11:01:49,884][03568] Updated weights for policy 0, policy_version 1470 (0.0023) [2025-01-10 11:01:53,946][01485] Fps is (10 sec: 4509.0, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 6037504. Throughput: 0: 1008.9. Samples: 1507342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:01:53,948][01485] Avg episode reward: [(0, '25.070')] [2025-01-10 11:01:58,947][01485] Fps is (10 sec: 4506.9, 60 sec: 4164.2, 300 sec: 4054.3). Total num frames: 6057984. Throughput: 0: 1060.3. Samples: 1514032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:01:58,952][01485] Avg episode reward: [(0, '27.143')] [2025-01-10 11:01:58,962][03555] Saving new best policy, reward=27.143! [2025-01-10 11:02:00,173][03568] Updated weights for policy 0, policy_version 1480 (0.0020) [2025-01-10 11:02:03,946][01485] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 6074368. Throughput: 0: 1000.4. Samples: 1518666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:02:03,949][01485] Avg episode reward: [(0, '27.507')] [2025-01-10 11:02:03,952][03555] Saving new best policy, reward=27.507! [2025-01-10 11:02:08,946][01485] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 6098944. Throughput: 0: 999.0. Samples: 1522234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:08,949][01485] Avg episode reward: [(0, '25.875')] [2025-01-10 11:02:08,961][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth... [2025-01-10 11:02:09,094][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001251_5124096.pth [2025-01-10 11:02:09,475][03568] Updated weights for policy 0, policy_version 1490 (0.0022) [2025-01-10 11:02:13,946][01485] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 6119424. Throughput: 0: 1048.3. Samples: 1529380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:13,953][01485] Avg episode reward: [(0, '27.115')] [2025-01-10 11:02:18,949][01485] Fps is (10 sec: 3685.4, 60 sec: 4027.5, 300 sec: 4040.4). Total num frames: 6135808. Throughput: 0: 1006.2. Samples: 1534048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:18,951][01485] Avg episode reward: [(0, '27.684')] [2025-01-10 11:02:18,969][03555] Saving new best policy, reward=27.684! [2025-01-10 11:02:20,748][03568] Updated weights for policy 0, policy_version 1500 (0.0021) [2025-01-10 11:02:23,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4068.2). Total num frames: 6160384. Throughput: 0: 991.0. Samples: 1537040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:23,951][01485] Avg episode reward: [(0, '26.298')] [2025-01-10 11:02:28,946][01485] Fps is (10 sec: 4506.9, 60 sec: 4164.3, 300 sec: 4068.3). Total num frames: 6180864. Throughput: 0: 1029.1. Samples: 1544456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:28,950][01485] Avg episode reward: [(0, '24.482')] [2025-01-10 11:02:29,022][03568] Updated weights for policy 0, policy_version 1510 (0.0019) [2025-01-10 11:02:33,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 6197248. Throughput: 0: 1027.9. Samples: 1549978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:02:33,949][01485] Avg episode reward: [(0, '26.127')] [2025-01-10 11:02:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4054.4). Total num frames: 6217728. Throughput: 0: 998.3. Samples: 1552266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:38,953][01485] Avg episode reward: [(0, '25.104')] [2025-01-10 11:02:40,076][03568] Updated weights for policy 0, policy_version 1520 (0.0021) [2025-01-10 11:02:43,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.8, 300 sec: 4068.2). Total num frames: 6242304. Throughput: 0: 1007.9. Samples: 1559388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:02:43,948][01485] Avg episode reward: [(0, '25.525')] [2025-01-10 11:02:48,947][01485] Fps is (10 sec: 4505.1, 60 sec: 4164.4, 300 sec: 4068.3). Total num frames: 6262784. Throughput: 0: 1049.1. Samples: 1565876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:02:48,950][01485] Avg episode reward: [(0, '26.605')] [2025-01-10 11:02:50,180][03568] Updated weights for policy 0, policy_version 1530 (0.0019) [2025-01-10 11:02:53,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 6275072. Throughput: 0: 1019.0. Samples: 1568088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:02:53,948][01485] Avg episode reward: [(0, '26.271')] [2025-01-10 11:02:58,946][01485] Fps is (10 sec: 3686.8, 60 sec: 4027.8, 300 sec: 4054.4). Total num frames: 6299648. Throughput: 0: 998.8. Samples: 1574326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:02:58,948][01485] Avg episode reward: [(0, '26.709')] [2025-01-10 11:02:59,883][03568] Updated weights for policy 0, policy_version 1540 (0.0013) [2025-01-10 11:03:03,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 6324224. Throughput: 0: 1054.5. Samples: 1581498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:03:03,950][01485] Avg episode reward: [(0, '25.385')] [2025-01-10 11:03:08,953][01485] Fps is (10 sec: 3684.1, 60 sec: 3959.0, 300 sec: 4026.6). Total num frames: 6336512. Throughput: 0: 1037.6. Samples: 1583738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:03:08,955][01485] Avg episode reward: [(0, '24.823')] [2025-01-10 11:03:11,359][03568] Updated weights for policy 0, policy_version 1550 (0.0025) [2025-01-10 11:03:13,952][01485] Fps is (10 sec: 3275.0, 60 sec: 3959.1, 300 sec: 4040.4). Total num frames: 6356992. Throughput: 0: 984.9. Samples: 1588780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:03:13,954][01485] Avg episode reward: [(0, '24.231')] [2025-01-10 11:03:18,946][01485] Fps is (10 sec: 4918.4, 60 sec: 4164.5, 300 sec: 4068.2). Total num frames: 6385664. Throughput: 0: 1023.4. Samples: 1596030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:03:18,949][01485] Avg episode reward: [(0, '23.468')] [2025-01-10 11:03:19,642][03568] Updated weights for policy 0, policy_version 1560 (0.0013) [2025-01-10 11:03:23,946][01485] Fps is (10 sec: 4508.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 6402048. Throughput: 0: 1046.7. Samples: 1599366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:03:23,952][01485] Avg episode reward: [(0, '24.008')] [2025-01-10 11:03:28,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 6418432. Throughput: 0: 987.0. Samples: 1603804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:03:28,955][01485] Avg episode reward: [(0, '24.412')] [2025-01-10 11:03:30,980][03568] Updated weights for policy 0, policy_version 1570 (0.0028) [2025-01-10 11:03:33,948][01485] Fps is (10 sec: 3685.7, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 6438912. Throughput: 0: 982.1. Samples: 1610072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:03:33,952][01485] Avg episode reward: [(0, '24.193')] [2025-01-10 11:03:38,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 6451200. Throughput: 0: 981.3. Samples: 1612248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:03:38,953][01485] Avg episode reward: [(0, '23.644')] [2025-01-10 11:03:43,953][01485] Fps is (10 sec: 2865.9, 60 sec: 3754.3, 300 sec: 3984.8). Total num frames: 6467584. Throughput: 0: 938.4. Samples: 1616560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:03:43,957][01485] Avg episode reward: [(0, '23.192')] [2025-01-10 11:03:44,814][03568] Updated weights for policy 0, policy_version 1580 (0.0023) [2025-01-10 11:03:48,947][01485] Fps is (10 sec: 3686.2, 60 sec: 3754.7, 300 sec: 3998.8). Total num frames: 6488064. Throughput: 0: 910.8. Samples: 1622484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:03:48,948][01485] Avg episode reward: [(0, '23.017')] [2025-01-10 11:03:53,310][03568] Updated weights for policy 0, policy_version 1590 (0.0021) [2025-01-10 11:03:53,946][01485] Fps is (10 sec: 4508.5, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 6512640. Throughput: 0: 940.8. Samples: 1626070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:03:53,953][01485] Avg episode reward: [(0, '22.472')] [2025-01-10 11:03:58,950][01485] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3984.9). Total num frames: 6529024. Throughput: 0: 968.3. Samples: 1632352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:03:58,952][01485] Avg episode reward: [(0, '22.780')] [2025-01-10 11:04:03,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3998.8). Total num frames: 6549504. Throughput: 0: 919.6. Samples: 1637410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:04:03,953][01485] Avg episode reward: [(0, '23.645')] [2025-01-10 11:04:04,449][03568] Updated weights for policy 0, policy_version 1600 (0.0018) [2025-01-10 11:04:08,946][01485] Fps is (10 sec: 4507.4, 60 sec: 3959.9, 300 sec: 4012.7). Total num frames: 6574080. Throughput: 0: 926.7. Samples: 1641068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:04:08,954][01485] Avg episode reward: [(0, '25.487')] [2025-01-10 11:04:08,969][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001605_6574080.pth... [2025-01-10 11:04:09,097][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001369_5607424.pth [2025-01-10 11:04:13,563][03568] Updated weights for policy 0, policy_version 1610 (0.0023) [2025-01-10 11:04:13,949][01485] Fps is (10 sec: 4504.2, 60 sec: 3959.6, 300 sec: 3998.8). Total num frames: 6594560. Throughput: 0: 987.8. Samples: 1648256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:04:13,957][01485] Avg episode reward: [(0, '25.587')] [2025-01-10 11:04:18,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3971.0). Total num frames: 6606848. Throughput: 0: 944.9. Samples: 1652592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:04:18,953][01485] Avg episode reward: [(0, '25.969')] [2025-01-10 11:04:23,946][01485] Fps is (10 sec: 3687.6, 60 sec: 3822.9, 300 sec: 4012.7). Total num frames: 6631424. Throughput: 0: 970.3. Samples: 1655912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:04:23,952][01485] Avg episode reward: [(0, '26.960')] [2025-01-10 11:04:24,140][03568] Updated weights for policy 0, policy_version 1620 (0.0021) [2025-01-10 11:04:28,946][01485] Fps is (10 sec: 4915.1, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 6656000. Throughput: 0: 1040.1. Samples: 1663360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:04:28,954][01485] Avg episode reward: [(0, '25.995')] [2025-01-10 11:04:33,950][01485] Fps is (10 sec: 4094.2, 60 sec: 3891.0, 300 sec: 3998.8). Total num frames: 6672384. Throughput: 0: 1023.1. Samples: 1668526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:04:33,956][01485] Avg episode reward: [(0, '25.400')] [2025-01-10 11:04:34,491][03568] Updated weights for policy 0, policy_version 1630 (0.0028) [2025-01-10 11:04:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 6692864. Throughput: 0: 996.7. Samples: 1670924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:04:38,952][01485] Avg episode reward: [(0, '25.449')] [2025-01-10 11:04:43,715][03568] Updated weights for policy 0, policy_version 1640 (0.0014) [2025-01-10 11:04:43,946][01485] Fps is (10 sec: 4507.6, 60 sec: 4164.7, 300 sec: 4054.3). Total num frames: 6717440. Throughput: 0: 1017.8. Samples: 1678150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:04:43,952][01485] Avg episode reward: [(0, '24.791')] [2025-01-10 11:04:48,947][01485] Fps is (10 sec: 4095.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 6733824. Throughput: 0: 1041.6. Samples: 1684284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:04:48,954][01485] Avg episode reward: [(0, '25.523')] [2025-01-10 11:04:53,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 6754304. Throughput: 0: 1009.2. Samples: 1686484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:04:53,953][01485] Avg episode reward: [(0, '25.225')] [2025-01-10 11:04:54,819][03568] Updated weights for policy 0, policy_version 1650 (0.0013) [2025-01-10 11:04:58,946][01485] Fps is (10 sec: 4506.1, 60 sec: 4164.5, 300 sec: 4054.3). Total num frames: 6778880. Throughput: 0: 999.0. Samples: 1693210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:04:58,949][01485] Avg episode reward: [(0, '24.535')] [2025-01-10 11:05:03,366][03568] Updated weights for policy 0, policy_version 1660 (0.0017) [2025-01-10 11:05:03,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 6799360. Throughput: 0: 1060.6. Samples: 1700318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:05:03,950][01485] Avg episode reward: [(0, '25.167')] [2025-01-10 11:05:08,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 6815744. Throughput: 0: 1035.0. Samples: 1702488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:05:08,954][01485] Avg episode reward: [(0, '24.894')] [2025-01-10 11:05:13,932][03568] Updated weights for policy 0, policy_version 1670 (0.0028) [2025-01-10 11:05:13,950][01485] Fps is (10 sec: 4094.2, 60 sec: 4095.9, 300 sec: 4054.4). Total num frames: 6840320. Throughput: 0: 1000.5. Samples: 1708386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:05:13,952][01485] Avg episode reward: [(0, '23.305')] [2025-01-10 11:05:18,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4054.3). Total num frames: 6860800. Throughput: 0: 1043.5. Samples: 1715480. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:05:18,953][01485] Avg episode reward: [(0, '23.215')] [2025-01-10 11:05:23,946][01485] Fps is (10 sec: 3688.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 6877184. Throughput: 0: 1051.3. Samples: 1718232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:05:23,950][01485] Avg episode reward: [(0, '24.043')] [2025-01-10 11:05:24,939][03568] Updated weights for policy 0, policy_version 1680 (0.0022) [2025-01-10 11:05:28,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 6897664. Throughput: 0: 999.1. Samples: 1723108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:05:28,951][01485] Avg episode reward: [(0, '24.614')] [2025-01-10 11:05:33,668][03568] Updated weights for policy 0, policy_version 1690 (0.0015) [2025-01-10 11:05:33,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.6, 300 sec: 4054.3). Total num frames: 6922240. Throughput: 0: 1026.2. Samples: 1730460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:05:33,953][01485] Avg episode reward: [(0, '24.452')] [2025-01-10 11:05:38,949][01485] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4040.4). Total num frames: 6942720. Throughput: 0: 1058.2. Samples: 1734106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:05:38,955][01485] Avg episode reward: [(0, '24.833')] [2025-01-10 11:05:43,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 6955008. Throughput: 0: 1007.0. Samples: 1738524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:05:43,948][01485] Avg episode reward: [(0, '25.419')] [2025-01-10 11:05:44,940][03568] Updated weights for policy 0, policy_version 1700 (0.0020) [2025-01-10 11:05:48,946][01485] Fps is (10 sec: 3687.3, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 6979584. Throughput: 0: 1001.4. Samples: 1745382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:05:48,952][01485] Avg episode reward: [(0, '24.005')] [2025-01-10 11:05:53,741][03568] Updated weights for policy 0, policy_version 1710 (0.0023) [2025-01-10 11:05:53,947][01485] Fps is (10 sec: 4914.8, 60 sec: 4164.2, 300 sec: 4054.3). Total num frames: 7004160. Throughput: 0: 1031.3. Samples: 1748896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:05:53,950][01485] Avg episode reward: [(0, '22.408')] [2025-01-10 11:05:58,948][01485] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 4012.7). Total num frames: 7016448. Throughput: 0: 1016.7. Samples: 1754134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:05:58,955][01485] Avg episode reward: [(0, '22.814')] [2025-01-10 11:06:03,946][01485] Fps is (10 sec: 3686.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 7041024. Throughput: 0: 993.8. Samples: 1760202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:06:03,953][01485] Avg episode reward: [(0, '24.038')] [2025-01-10 11:06:04,472][03568] Updated weights for policy 0, policy_version 1720 (0.0034) [2025-01-10 11:06:08,946][01485] Fps is (10 sec: 4916.1, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 7065600. Throughput: 0: 1012.8. Samples: 1763808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:06:08,948][01485] Avg episode reward: [(0, '23.765')] [2025-01-10 11:06:08,960][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001725_7065600.pth... [2025-01-10 11:06:09,100][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001489_6098944.pth [2025-01-10 11:06:13,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 4026.6). Total num frames: 7081984. Throughput: 0: 1044.1. Samples: 1770094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:06:13,951][01485] Avg episode reward: [(0, '23.907')] [2025-01-10 11:06:14,757][03568] Updated weights for policy 0, policy_version 1730 (0.0026) [2025-01-10 11:06:18,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 7098368. Throughput: 0: 990.6. Samples: 1775038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:06:18,948][01485] Avg episode reward: [(0, '25.006')] [2025-01-10 11:06:23,943][03568] Updated weights for policy 0, policy_version 1740 (0.0029) [2025-01-10 11:06:23,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 7127040. Throughput: 0: 993.0. Samples: 1778790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:06:23,953][01485] Avg episode reward: [(0, '24.688')] [2025-01-10 11:06:28,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 7147520. Throughput: 0: 1057.9. Samples: 1786128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:06:28,953][01485] Avg episode reward: [(0, '23.332')] [2025-01-10 11:06:33,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 7159808. Throughput: 0: 1002.5. Samples: 1790496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:06:33,954][01485] Avg episode reward: [(0, '23.130')] [2025-01-10 11:06:35,277][03568] Updated weights for policy 0, policy_version 1750 (0.0019) [2025-01-10 11:06:38,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 4040.6). Total num frames: 7184384. Throughput: 0: 996.1. Samples: 1793720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:06:38,953][01485] Avg episode reward: [(0, '22.078')] [2025-01-10 11:06:43,660][03568] Updated weights for policy 0, policy_version 1760 (0.0017) [2025-01-10 11:06:43,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4054.4). Total num frames: 7208960. Throughput: 0: 1041.4. Samples: 1800994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:06:43,948][01485] Avg episode reward: [(0, '22.181')] [2025-01-10 11:06:48,946][01485] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 7221248. Throughput: 0: 1023.6. Samples: 1806266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:06:48,951][01485] Avg episode reward: [(0, '23.199')] [2025-01-10 11:06:53,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 7245824. Throughput: 0: 995.7. Samples: 1808614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:06:53,954][01485] Avg episode reward: [(0, '23.469')] [2025-01-10 11:06:54,822][03568] Updated weights for policy 0, policy_version 1770 (0.0030) [2025-01-10 11:06:58,946][01485] Fps is (10 sec: 4915.4, 60 sec: 4232.7, 300 sec: 4054.3). Total num frames: 7270400. Throughput: 0: 1018.9. Samples: 1815944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:06:58,954][01485] Avg episode reward: [(0, '23.094')] [2025-01-10 11:07:03,947][01485] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 7286784. Throughput: 0: 1048.4. Samples: 1822218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:07:03,951][01485] Avg episode reward: [(0, '24.083')] [2025-01-10 11:07:04,433][03568] Updated weights for policy 0, policy_version 1780 (0.0022) [2025-01-10 11:07:08,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 7303168. Throughput: 0: 1012.3. Samples: 1824344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:07:08,956][01485] Avg episode reward: [(0, '24.340')] [2025-01-10 11:07:13,946][01485] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 7327744. Throughput: 0: 994.8. Samples: 1830894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:07:13,953][01485] Avg episode reward: [(0, '22.079')] [2025-01-10 11:07:14,440][03568] Updated weights for policy 0, policy_version 1790 (0.0017) [2025-01-10 11:07:18,952][01485] Fps is (10 sec: 4503.2, 60 sec: 4163.9, 300 sec: 4026.5). Total num frames: 7348224. Throughput: 0: 1054.6. Samples: 1837960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:07:18,954][01485] Avg episode reward: [(0, '21.338')] [2025-01-10 11:07:23,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 7364608. Throughput: 0: 1032.9. Samples: 1840202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:07:23,949][01485] Avg episode reward: [(0, '22.555')] [2025-01-10 11:07:25,540][03568] Updated weights for policy 0, policy_version 1800 (0.0022) [2025-01-10 11:07:28,946][01485] Fps is (10 sec: 4098.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 7389184. Throughput: 0: 996.0. Samples: 1845814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:07:28,949][01485] Avg episode reward: [(0, '23.132')] [2025-01-10 11:07:33,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 7405568. Throughput: 0: 1014.7. Samples: 1851926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:07:33,953][01485] Avg episode reward: [(0, '23.703')] [2025-01-10 11:07:36,375][03568] Updated weights for policy 0, policy_version 1810 (0.0021) [2025-01-10 11:07:38,947][01485] Fps is (10 sec: 2867.0, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 7417856. Throughput: 0: 1006.4. Samples: 1853904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:07:38,950][01485] Avg episode reward: [(0, '23.277')] [2025-01-10 11:07:43,946][01485] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3971.1). Total num frames: 7434240. Throughput: 0: 931.3. Samples: 1857852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:07:43,952][01485] Avg episode reward: [(0, '23.808')] [2025-01-10 11:07:47,895][03568] Updated weights for policy 0, policy_version 1820 (0.0018) [2025-01-10 11:07:48,946][01485] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 7458816. Throughput: 0: 942.8. Samples: 1864642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:07:48,949][01485] Avg episode reward: [(0, '25.718')] [2025-01-10 11:07:53,955][01485] Fps is (10 sec: 4501.7, 60 sec: 3890.6, 300 sec: 3998.7). Total num frames: 7479296. Throughput: 0: 974.7. Samples: 1868214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:07:53,957][01485] Avg episode reward: [(0, '25.115')] [2025-01-10 11:07:58,555][03568] Updated weights for policy 0, policy_version 1830 (0.0034) [2025-01-10 11:07:58,948][01485] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3971.0). Total num frames: 7495680. Throughput: 0: 946.9. Samples: 1873506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:07:58,951][01485] Avg episode reward: [(0, '25.243')] [2025-01-10 11:08:03,946][01485] Fps is (10 sec: 3689.6, 60 sec: 3823.0, 300 sec: 3998.9). Total num frames: 7516160. Throughput: 0: 924.8. Samples: 1879572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:03,949][01485] Avg episode reward: [(0, '25.006')] [2025-01-10 11:08:07,547][03568] Updated weights for policy 0, policy_version 1840 (0.0036) [2025-01-10 11:08:08,946][01485] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 4012.8). Total num frames: 7540736. Throughput: 0: 953.2. Samples: 1883096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:08:08,951][01485] Avg episode reward: [(0, '25.837')] [2025-01-10 11:08:08,968][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001841_7540736.pth... [2025-01-10 11:08:09,103][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001605_6574080.pth [2025-01-10 11:08:13,950][01485] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3971.0). Total num frames: 7557120. Throughput: 0: 965.0. Samples: 1889244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:08:13,956][01485] Avg episode reward: [(0, '24.314')] [2025-01-10 11:08:18,796][03568] Updated weights for policy 0, policy_version 1850 (0.0022) [2025-01-10 11:08:18,946][01485] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3984.9). Total num frames: 7577600. Throughput: 0: 939.0. Samples: 1894180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:08:18,949][01485] Avg episode reward: [(0, '24.112')] [2025-01-10 11:08:23,946][01485] Fps is (10 sec: 4507.1, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 7602176. Throughput: 0: 976.1. Samples: 1897826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:08:23,949][01485] Avg episode reward: [(0, '24.564')] [2025-01-10 11:08:27,332][03568] Updated weights for policy 0, policy_version 1860 (0.0015) [2025-01-10 11:08:28,948][01485] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 4012.7). Total num frames: 7622656. Throughput: 0: 1050.4. Samples: 1905122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:28,952][01485] Avg episode reward: [(0, '24.151')] [2025-01-10 11:08:33,946][01485] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4012.7). Total num frames: 7634944. Throughput: 0: 995.8. Samples: 1909452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-10 11:08:33,953][01485] Avg episode reward: [(0, '24.470')] [2025-01-10 11:08:38,168][03568] Updated weights for policy 0, policy_version 1870 (0.0027) [2025-01-10 11:08:38,946][01485] Fps is (10 sec: 3686.9, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 7659520. Throughput: 0: 993.0. Samples: 1912890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:38,951][01485] Avg episode reward: [(0, '25.634')] [2025-01-10 11:08:43,946][01485] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4054.4). Total num frames: 7684096. Throughput: 0: 1035.8. Samples: 1920114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:43,949][01485] Avg episode reward: [(0, '25.524')] [2025-01-10 11:08:48,739][03568] Updated weights for policy 0, policy_version 1880 (0.0025) [2025-01-10 11:08:48,947][01485] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 7700480. Throughput: 0: 1013.6. Samples: 1925184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:48,951][01485] Avg episode reward: [(0, '25.182')] [2025-01-10 11:08:53,946][01485] Fps is (10 sec: 3686.4, 60 sec: 4028.3, 300 sec: 4040.5). Total num frames: 7720960. Throughput: 0: 989.6. Samples: 1927630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:53,950][01485] Avg episode reward: [(0, '25.765')] [2025-01-10 11:08:57,924][03568] Updated weights for policy 0, policy_version 1890 (0.0026) [2025-01-10 11:08:58,946][01485] Fps is (10 sec: 4505.8, 60 sec: 4164.4, 300 sec: 4054.3). Total num frames: 7745536. Throughput: 0: 1015.0. Samples: 1934914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:08:58,948][01485] Avg episode reward: [(0, '27.493')] [2025-01-10 11:09:03,946][01485] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 7761920. Throughput: 0: 1042.7. Samples: 1941100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:09:03,949][01485] Avg episode reward: [(0, '26.678')] [2025-01-10 11:09:08,947][01485] Fps is (10 sec: 3276.7, 60 sec: 3959.4, 300 sec: 4012.7). Total num frames: 7778304. Throughput: 0: 1009.0. Samples: 1943232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:09:08,949][01485] Avg episode reward: [(0, '25.091')] [2025-01-10 11:09:09,114][03568] Updated weights for policy 0, policy_version 1900 (0.0019) [2025-01-10 11:09:13,946][01485] Fps is (10 sec: 4096.2, 60 sec: 4096.2, 300 sec: 4054.3). Total num frames: 7802880. Throughput: 0: 995.8. Samples: 1949932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:09:13,948][01485] Avg episode reward: [(0, '25.253')] [2025-01-10 11:09:17,535][03568] Updated weights for policy 0, policy_version 1910 (0.0017) [2025-01-10 11:09:18,946][01485] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 7823360. Throughput: 0: 1052.0. Samples: 1956790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-10 11:09:18,950][01485] Avg episode reward: [(0, '25.021')] [2025-01-10 11:09:23,950][01485] Fps is (10 sec: 3684.8, 60 sec: 3959.2, 300 sec: 4012.6). Total num frames: 7839744. Throughput: 0: 1023.5. Samples: 1958950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:09:23,953][01485] Avg episode reward: [(0, '24.410')] [2025-01-10 11:09:28,938][03568] Updated weights for policy 0, policy_version 1920 (0.0013) [2025-01-10 11:09:28,946][01485] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 7864320. Throughput: 0: 989.6. Samples: 1964646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-10 11:09:28,949][01485] Avg episode reward: [(0, '23.526')] [2025-01-10 11:09:33,946][01485] Fps is (10 sec: 4917.2, 60 sec: 4232.5, 300 sec: 4054.3). Total num frames: 7888896. Throughput: 0: 1036.7. Samples: 1971836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:09:33,949][01485] Avg episode reward: [(0, '25.160')] [2025-01-10 11:09:38,946][01485] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 7901184. Throughput: 0: 1046.0. Samples: 1974700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-10 11:09:38,955][01485] Avg episode reward: [(0, '25.151')] [2025-01-10 11:09:39,162][03568] Updated weights for policy 0, policy_version 1930 (0.0014) [2025-01-10 11:09:43,946][01485] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 7921664. Throughput: 0: 988.1. Samples: 1979380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-10 11:09:43,953][01485] Avg episode reward: [(0, '26.450')] [2025-01-10 11:09:48,596][03568] Updated weights for policy 0, policy_version 1940 (0.0025) [2025-01-10 11:09:48,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 7946240. Throughput: 0: 1012.1. Samples: 1986646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-10 11:09:48,954][01485] Avg episode reward: [(0, '25.490')] [2025-01-10 11:09:53,946][01485] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 7966720. Throughput: 0: 1045.8. Samples: 1990292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:09:53,948][01485] Avg episode reward: [(0, '24.542')] [2025-01-10 11:09:58,946][01485] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 7983104. Throughput: 0: 997.9. Samples: 1994836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-10 11:09:58,949][01485] Avg episode reward: [(0, '25.707')] [2025-01-10 11:09:59,712][03568] Updated weights for policy 0, policy_version 1950 (0.0026) [2025-01-10 11:10:03,808][03555] Stopping Batcher_0... [2025-01-10 11:10:03,809][03555] Loop batcher_evt_loop terminating... [2025-01-10 11:10:03,811][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-01-10 11:10:03,810][01485] Component Batcher_0 stopped! [2025-01-10 11:10:03,892][03568] Weights refcount: 2 0 [2025-01-10 11:10:03,894][01485] Component InferenceWorker_p0-w0 stopped! [2025-01-10 11:10:03,898][03568] Stopping InferenceWorker_p0-w0... [2025-01-10 11:10:03,899][03568] Loop inference_proc0-0_evt_loop terminating... [2025-01-10 11:10:03,960][03555] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001725_7065600.pth [2025-01-10 11:10:03,970][03555] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-01-10 11:10:04,155][01485] Component LearnerWorker_p0 stopped! [2025-01-10 11:10:04,159][03555] Stopping LearnerWorker_p0... [2025-01-10 11:10:04,159][03555] Loop learner_proc0_evt_loop terminating... [2025-01-10 11:10:04,185][03575] Stopping RolloutWorker_w7... [2025-01-10 11:10:04,185][01485] Component RolloutWorker_w7 stopped! [2025-01-10 11:10:04,186][03575] Loop rollout_proc7_evt_loop terminating... [2025-01-10 11:10:04,196][01485] Component RolloutWorker_w5 stopped! [2025-01-10 11:10:04,203][03573] Stopping RolloutWorker_w5... [2025-01-10 11:10:04,203][03573] Loop rollout_proc5_evt_loop terminating... [2025-01-10 11:10:04,208][01485] Component RolloutWorker_w3 stopped! [2025-01-10 11:10:04,213][03572] Stopping RolloutWorker_w3... [2025-01-10 11:10:04,217][01485] Component RolloutWorker_w1 stopped! [2025-01-10 11:10:04,224][03570] Stopping RolloutWorker_w1... [2025-01-10 11:10:04,225][03570] Loop rollout_proc1_evt_loop terminating... [2025-01-10 11:10:04,214][03572] Loop rollout_proc3_evt_loop terminating... [2025-01-10 11:10:04,346][03574] Stopping RolloutWorker_w4... [2025-01-10 11:10:04,346][03574] Loop rollout_proc4_evt_loop terminating... [2025-01-10 11:10:04,345][01485] Component RolloutWorker_w4 stopped! [2025-01-10 11:10:04,360][01485] Component RolloutWorker_w0 stopped! [2025-01-10 11:10:04,361][03569] Stopping RolloutWorker_w0... [2025-01-10 11:10:04,362][03569] Loop rollout_proc0_evt_loop terminating... [2025-01-10 11:10:04,384][01485] Component RolloutWorker_w2 stopped! [2025-01-10 11:10:04,387][03571] Stopping RolloutWorker_w2... [2025-01-10 11:10:04,390][03571] Loop rollout_proc2_evt_loop terminating... [2025-01-10 11:10:04,392][01485] Component RolloutWorker_w6 stopped! [2025-01-10 11:10:04,396][01485] Waiting for process learner_proc0 to stop... [2025-01-10 11:10:04,401][03576] Stopping RolloutWorker_w6... [2025-01-10 11:10:04,402][03576] Loop rollout_proc6_evt_loop terminating... [2025-01-10 11:10:05,700][01485] Waiting for process inference_proc0-0 to join... [2025-01-10 11:10:05,706][01485] Waiting for process rollout_proc0 to join... [2025-01-10 11:10:07,617][01485] Waiting for process rollout_proc1 to join... [2025-01-10 11:10:07,626][01485] Waiting for process rollout_proc2 to join... [2025-01-10 11:10:07,630][01485] Waiting for process rollout_proc3 to join... [2025-01-10 11:10:07,633][01485] Waiting for process rollout_proc4 to join... [2025-01-10 11:10:07,638][01485] Waiting for process rollout_proc5 to join... [2025-01-10 11:10:07,641][01485] Waiting for process rollout_proc6 to join... [2025-01-10 11:10:07,644][01485] Waiting for process rollout_proc7 to join... [2025-01-10 11:10:07,650][01485] Batcher 0 profile tree view: batching: 52.8690, releasing_batches: 0.0508 [2025-01-10 11:10:07,652][01485] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 772.3879 update_model: 16.7651 weight_update: 0.0013 one_step: 0.0025 handle_policy_step: 1133.7221 deserialize: 28.1482, stack: 6.2217, obs_to_device_normalize: 245.7290, forward: 565.9922, send_messages: 56.2163 prepare_outputs: 174.0739 to_cpu: 105.7731 [2025-01-10 11:10:07,654][01485] Learner 0 profile tree view: misc: 0.0096, prepare_batch: 24.8832 train: 142.4482 epoch_init: 0.0140, minibatch_init: 0.0167, losses_postprocess: 1.2208, kl_divergence: 1.1859, after_optimizer: 66.5056 calculate_losses: 50.0216 losses_init: 0.0072, forward_head: 2.4030, bptt_initial: 33.4288, tail: 1.9341, advantages_returns: 0.6098, losses: 7.3678 bptt: 3.6946 bptt_forward_core: 3.5015 update: 22.1431 clip: 1.6630 [2025-01-10 11:10:07,656][01485] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6851, enqueue_policy_requests: 180.6486, env_step: 1584.6989, overhead: 24.3448, complete_rollouts: 14.3137 save_policy_outputs: 40.7447 split_output_tensors: 16.8325 [2025-01-10 11:10:07,657][01485] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.5950, enqueue_policy_requests: 186.2079, env_step: 1581.5339, overhead: 24.5694, complete_rollouts: 13.0413 save_policy_outputs: 40.3386 split_output_tensors: 16.2947 [2025-01-10 11:10:07,659][01485] Loop Runner_EvtLoop terminating... [2025-01-10 11:10:07,660][01485] Runner profile tree view: main_loop: 2033.7339 [2025-01-10 11:10:07,662][01485] Collected {0: 8007680}, FPS: 3937.4 [2025-01-10 11:11:48,966][01485] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-10 11:11:48,968][01485] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-10 11:11:48,970][01485] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-10 11:11:48,971][01485] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-10 11:11:48,973][01485] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-10 11:11:48,975][01485] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-10 11:11:48,976][01485] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-10 11:11:48,978][01485] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-10 11:11:48,979][01485] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-10 11:11:48,980][01485] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-10 11:11:48,981][01485] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-10 11:11:48,984][01485] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-10 11:11:48,985][01485] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-10 11:11:48,987][01485] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-10 11:11:48,989][01485] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-10 11:11:49,019][01485] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-10 11:11:49,023][01485] RunningMeanStd input shape: (3, 72, 128) [2025-01-10 11:11:49,026][01485] RunningMeanStd input shape: (1,) [2025-01-10 11:11:49,042][01485] ConvEncoder: input_channels=3 [2025-01-10 11:11:49,167][01485] Conv encoder output size: 512 [2025-01-10 11:11:49,168][01485] Policy head output size: 512 [2025-01-10 11:11:49,338][01485] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-01-10 11:11:50,116][01485] Num frames 100... [2025-01-10 11:11:50,237][01485] Num frames 200... [2025-01-10 11:11:50,375][01485] Num frames 300... [2025-01-10 11:11:50,495][01485] Num frames 400... [2025-01-10 11:11:50,625][01485] Num frames 500... [2025-01-10 11:11:50,758][01485] Num frames 600... [2025-01-10 11:11:50,887][01485] Num frames 700... [2025-01-10 11:11:51,008][01485] Num frames 800... [2025-01-10 11:11:51,139][01485] Avg episode rewards: #0: 16.580, true rewards: #0: 8.580 [2025-01-10 11:11:51,141][01485] Avg episode reward: 16.580, avg true_objective: 8.580 [2025-01-10 11:11:51,194][01485] Num frames 900... [2025-01-10 11:11:51,332][01485] Num frames 1000... [2025-01-10 11:11:51,500][01485] Num frames 1100... [2025-01-10 11:11:51,674][01485] Num frames 1200... [2025-01-10 11:11:51,849][01485] Num frames 1300... [2025-01-10 11:11:51,971][01485] Avg episode rewards: #0: 12.190, true rewards: #0: 6.690 [2025-01-10 11:11:51,975][01485] Avg episode reward: 12.190, avg true_objective: 6.690 [2025-01-10 11:11:52,086][01485] Num frames 1400... [2025-01-10 11:11:52,246][01485] Num frames 1500... [2025-01-10 11:11:52,408][01485] Num frames 1600... [2025-01-10 11:11:52,592][01485] Num frames 1700... [2025-01-10 11:11:52,762][01485] Num frames 1800... [2025-01-10 11:11:52,938][01485] Num frames 1900... [2025-01-10 11:11:53,122][01485] Num frames 2000... [2025-01-10 11:11:53,291][01485] Num frames 2100... [2025-01-10 11:11:53,465][01485] Num frames 2200... [2025-01-10 11:11:53,681][01485] Avg episode rewards: #0: 15.630, true rewards: #0: 7.630 [2025-01-10 11:11:53,683][01485] Avg episode reward: 15.630, avg true_objective: 7.630 [2025-01-10 11:11:53,705][01485] Num frames 2300... [2025-01-10 11:11:53,838][01485] Num frames 2400... [2025-01-10 11:11:53,963][01485] Num frames 2500... [2025-01-10 11:11:54,090][01485] Num frames 2600... [2025-01-10 11:11:54,212][01485] Num frames 2700... [2025-01-10 11:11:54,333][01485] Num frames 2800... [2025-01-10 11:11:54,450][01485] Num frames 2900... [2025-01-10 11:11:54,573][01485] Num frames 3000... [2025-01-10 11:11:54,697][01485] Num frames 3100... [2025-01-10 11:11:54,818][01485] Num frames 3200... [2025-01-10 11:11:54,946][01485] Num frames 3300... [2025-01-10 11:11:55,072][01485] Num frames 3400... [2025-01-10 11:11:55,198][01485] Num frames 3500... [2025-01-10 11:11:55,321][01485] Num frames 3600... [2025-01-10 11:11:55,444][01485] Num frames 3700... [2025-01-10 11:11:55,568][01485] Num frames 3800... [2025-01-10 11:11:55,690][01485] Num frames 3900... [2025-01-10 11:11:55,811][01485] Num frames 4000... [2025-01-10 11:11:55,888][01485] Avg episode rewards: #0: 21.792, true rewards: #0: 10.042 [2025-01-10 11:11:55,890][01485] Avg episode reward: 21.792, avg true_objective: 10.042 [2025-01-10 11:11:55,997][01485] Num frames 4100... [2025-01-10 11:11:56,123][01485] Num frames 4200... [2025-01-10 11:11:56,244][01485] Num frames 4300... [2025-01-10 11:11:56,364][01485] Num frames 4400... [2025-01-10 11:11:56,482][01485] Num frames 4500... [2025-01-10 11:11:56,607][01485] Num frames 4600... [2025-01-10 11:11:56,732][01485] Num frames 4700... [2025-01-10 11:11:56,852][01485] Num frames 4800... [2025-01-10 11:11:56,979][01485] Num frames 4900... [2025-01-10 11:11:57,109][01485] Num frames 5000... [2025-01-10 11:11:57,229][01485] Num frames 5100... [2025-01-10 11:11:57,354][01485] Num frames 5200... [2025-01-10 11:11:57,473][01485] Num frames 5300... [2025-01-10 11:11:57,596][01485] Num frames 5400... [2025-01-10 11:11:57,685][01485] Avg episode rewards: #0: 23.450, true rewards: #0: 10.850 [2025-01-10 11:11:57,688][01485] Avg episode reward: 23.450, avg true_objective: 10.850 [2025-01-10 11:11:57,776][01485] Num frames 5500... [2025-01-10 11:11:57,898][01485] Num frames 5600... [2025-01-10 11:11:58,035][01485] Num frames 5700... [2025-01-10 11:11:58,162][01485] Num frames 5800... [2025-01-10 11:11:58,286][01485] Num frames 5900... [2025-01-10 11:11:58,406][01485] Num frames 6000... [2025-01-10 11:11:58,529][01485] Num frames 6100... [2025-01-10 11:11:58,655][01485] Num frames 6200... [2025-01-10 11:11:58,774][01485] Num frames 6300... [2025-01-10 11:11:58,895][01485] Num frames 6400... [2025-01-10 11:11:59,015][01485] Num frames 6500... [2025-01-10 11:11:59,136][01485] Avg episode rewards: #0: 24.075, true rewards: #0: 10.908 [2025-01-10 11:11:59,137][01485] Avg episode reward: 24.075, avg true_objective: 10.908 [2025-01-10 11:11:59,204][01485] Num frames 6600... [2025-01-10 11:11:59,319][01485] Num frames 6700... [2025-01-10 11:11:59,434][01485] Num frames 6800... [2025-01-10 11:11:59,554][01485] Num frames 6900... [2025-01-10 11:11:59,675][01485] Num frames 7000... [2025-01-10 11:11:59,796][01485] Num frames 7100... [2025-01-10 11:11:59,914][01485] Avg episode rewards: #0: 21.933, true rewards: #0: 10.219 [2025-01-10 11:11:59,915][01485] Avg episode reward: 21.933, avg true_objective: 10.219 [2025-01-10 11:11:59,975][01485] Num frames 7200... [2025-01-10 11:12:00,115][01485] Num frames 7300... [2025-01-10 11:12:00,236][01485] Num frames 7400... [2025-01-10 11:12:00,357][01485] Num frames 7500... [2025-01-10 11:12:00,475][01485] Num frames 7600... [2025-01-10 11:12:00,597][01485] Num frames 7700... [2025-01-10 11:12:00,725][01485] Num frames 7800... [2025-01-10 11:12:00,850][01485] Num frames 7900... [2025-01-10 11:12:00,992][01485] Num frames 8000... [2025-01-10 11:12:01,130][01485] Num frames 8100... [2025-01-10 11:12:01,253][01485] Num frames 8200... [2025-01-10 11:12:01,358][01485] Avg episode rewards: #0: 22.301, true rewards: #0: 10.301 [2025-01-10 11:12:01,360][01485] Avg episode reward: 22.301, avg true_objective: 10.301 [2025-01-10 11:12:01,431][01485] Num frames 8300... [2025-01-10 11:12:01,553][01485] Num frames 8400... [2025-01-10 11:12:01,676][01485] Num frames 8500... [2025-01-10 11:12:01,796][01485] Num frames 8600... [2025-01-10 11:12:01,918][01485] Num frames 8700... [2025-01-10 11:12:02,040][01485] Num frames 8800... [2025-01-10 11:12:02,177][01485] Num frames 8900... [2025-01-10 11:12:02,297][01485] Num frames 9000... [2025-01-10 11:12:02,419][01485] Num frames 9100... [2025-01-10 11:12:02,483][01485] Avg episode rewards: #0: 22.006, true rewards: #0: 10.117 [2025-01-10 11:12:02,484][01485] Avg episode reward: 22.006, avg true_objective: 10.117 [2025-01-10 11:12:02,600][01485] Num frames 9200... [2025-01-10 11:12:02,722][01485] Num frames 9300... [2025-01-10 11:12:02,843][01485] Num frames 9400... [2025-01-10 11:12:02,964][01485] Num frames 9500... [2025-01-10 11:12:03,035][01485] Avg episode rewards: #0: 20.311, true rewards: #0: 9.511 [2025-01-10 11:12:03,037][01485] Avg episode reward: 20.311, avg true_objective: 9.511 [2025-01-10 11:12:57,163][01485] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-10 11:14:35,083][01485] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-10 11:14:35,085][01485] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-10 11:14:35,087][01485] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-10 11:14:35,089][01485] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-10 11:14:35,090][01485] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-10 11:14:35,092][01485] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-10 11:14:35,094][01485] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-10 11:14:35,096][01485] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-10 11:14:35,097][01485] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-10 11:14:35,099][01485] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-10 11:14:35,100][01485] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-10 11:14:35,102][01485] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-10 11:14:35,103][01485] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-10 11:14:35,104][01485] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-10 11:14:35,106][01485] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-10 11:14:35,135][01485] RunningMeanStd input shape: (3, 72, 128) [2025-01-10 11:14:35,137][01485] RunningMeanStd input shape: (1,) [2025-01-10 11:14:35,150][01485] ConvEncoder: input_channels=3 [2025-01-10 11:14:35,188][01485] Conv encoder output size: 512 [2025-01-10 11:14:35,189][01485] Policy head output size: 512 [2025-01-10 11:14:35,209][01485] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-01-10 11:14:35,618][01485] Num frames 100... [2025-01-10 11:14:35,740][01485] Num frames 200... [2025-01-10 11:14:35,860][01485] Num frames 300... [2025-01-10 11:14:35,987][01485] Num frames 400... [2025-01-10 11:14:36,122][01485] Num frames 500... [2025-01-10 11:14:36,254][01485] Num frames 600... [2025-01-10 11:14:36,376][01485] Num frames 700... [2025-01-10 11:14:36,500][01485] Num frames 800... [2025-01-10 11:14:36,620][01485] Num frames 900... [2025-01-10 11:14:36,740][01485] Num frames 1000... [2025-01-10 11:14:36,857][01485] Num frames 1100... [2025-01-10 11:14:36,986][01485] Num frames 1200... [2025-01-10 11:14:37,113][01485] Num frames 1300... [2025-01-10 11:14:37,238][01485] Num frames 1400... [2025-01-10 11:14:37,360][01485] Num frames 1500... [2025-01-10 11:14:37,479][01485] Num frames 1600... [2025-01-10 11:14:37,603][01485] Num frames 1700... [2025-01-10 11:14:37,730][01485] Num frames 1800... [2025-01-10 11:14:37,851][01485] Avg episode rewards: #0: 46.559, true rewards: #0: 18.560 [2025-01-10 11:14:37,853][01485] Avg episode reward: 46.559, avg true_objective: 18.560 [2025-01-10 11:14:37,908][01485] Num frames 1900... [2025-01-10 11:14:38,040][01485] Num frames 2000... [2025-01-10 11:14:38,169][01485] Num frames 2100... [2025-01-10 11:14:39,514][01485] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-10 11:14:39,515][01485] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-10 11:14:39,517][01485] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-10 11:14:39,518][01485] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-10 11:14:39,521][01485] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-10 11:14:39,523][01485] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-10 11:14:39,525][01485] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-10 11:14:39,526][01485] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-10 11:14:39,527][01485] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-10 11:14:39,528][01485] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-10 11:14:39,529][01485] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-10 11:14:39,530][01485] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-10 11:14:39,531][01485] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-10 11:14:39,535][01485] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-10 11:14:39,536][01485] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-10 11:14:39,581][01485] RunningMeanStd input shape: (3, 72, 128) [2025-01-10 11:14:39,583][01485] RunningMeanStd input shape: (1,) [2025-01-10 11:14:39,601][01485] ConvEncoder: input_channels=3 [2025-01-10 11:14:39,659][01485] Conv encoder output size: 512 [2025-01-10 11:14:39,661][01485] Policy head output size: 512 [2025-01-10 11:14:39,689][01485] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-01-10 11:18:33,871][01485] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-10 11:18:33,872][01485] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-10 11:18:33,874][01485] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-10 11:18:33,875][01485] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-10 11:18:33,877][01485] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-10 11:18:33,878][01485] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-10 11:18:33,880][01485] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-10 11:18:33,881][01485] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-10 11:18:33,883][01485] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-10 11:18:33,884][01485] Adding new argument 'hf_repository'='HCho/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-10 11:18:33,886][01485] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-10 11:18:33,887][01485] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-10 11:18:33,889][01485] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-10 11:18:33,890][01485] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-10 11:18:33,892][01485] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-10 11:18:33,928][01485] RunningMeanStd input shape: (3, 72, 128) [2025-01-10 11:18:33,930][01485] RunningMeanStd input shape: (1,) [2025-01-10 11:18:33,944][01485] ConvEncoder: input_channels=3 [2025-01-10 11:18:33,993][01485] Conv encoder output size: 512 [2025-01-10 11:18:33,994][01485] Policy head output size: 512 [2025-01-10 11:18:34,016][01485] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2025-01-10 11:18:34,442][01485] Num frames 100... [2025-01-10 11:18:34,563][01485] Num frames 200... [2025-01-10 11:18:34,687][01485] Num frames 300... [2025-01-10 11:18:34,808][01485] Num frames 400... [2025-01-10 11:18:34,924][01485] Num frames 500... [2025-01-10 11:18:35,054][01485] Num frames 600... [2025-01-10 11:18:35,181][01485] Num frames 700... [2025-01-10 11:18:35,300][01485] Num frames 800... [2025-01-10 11:18:35,475][01485] Avg episode rewards: #0: 20.960, true rewards: #0: 8.960 [2025-01-10 11:18:35,476][01485] Avg episode reward: 20.960, avg true_objective: 8.960 [2025-01-10 11:18:35,484][01485] Num frames 900... [2025-01-10 11:18:35,610][01485] Num frames 1000... [2025-01-10 11:18:35,731][01485] Num frames 1100... [2025-01-10 11:18:35,849][01485] Num frames 1200... [2025-01-10 11:18:35,968][01485] Num frames 1300... [2025-01-10 11:18:36,108][01485] Num frames 1400... [2025-01-10 11:18:36,232][01485] Num frames 1500... [2025-01-10 11:18:36,352][01485] Num frames 1600... [2025-01-10 11:18:36,474][01485] Num frames 1700... [2025-01-10 11:18:36,592][01485] Num frames 1800... [2025-01-10 11:18:36,714][01485] Num frames 1900... [2025-01-10 11:18:36,838][01485] Num frames 2000... [2025-01-10 11:18:36,956][01485] Num frames 2100... [2025-01-10 11:18:37,095][01485] Num frames 2200... [2025-01-10 11:18:37,184][01485] Avg episode rewards: #0: 27.130, true rewards: #0: 11.130 [2025-01-10 11:18:37,186][01485] Avg episode reward: 27.130, avg true_objective: 11.130 [2025-01-10 11:18:37,274][01485] Num frames 2300... [2025-01-10 11:18:37,394][01485] Num frames 2400... [2025-01-10 11:18:37,514][01485] Num frames 2500... [2025-01-10 11:18:37,632][01485] Num frames 2600... [2025-01-10 11:18:37,757][01485] Num frames 2700... [2025-01-10 11:18:37,877][01485] Num frames 2800... [2025-01-10 11:18:38,000][01485] Num frames 2900... [2025-01-10 11:18:38,142][01485] Num frames 3000... [2025-01-10 11:18:38,272][01485] Num frames 3100... [2025-01-10 11:18:38,394][01485] Num frames 3200... [2025-01-10 11:18:38,515][01485] Num frames 3300... [2025-01-10 11:18:38,650][01485] Num frames 3400... [2025-01-10 11:18:38,777][01485] Num frames 3500... [2025-01-10 11:18:38,904][01485] Num frames 3600... [2025-01-10 11:18:39,041][01485] Avg episode rewards: #0: 28.554, true rewards: #0: 12.220 [2025-01-10 11:18:39,042][01485] Avg episode reward: 28.554, avg true_objective: 12.220 [2025-01-10 11:18:39,095][01485] Num frames 3700... [2025-01-10 11:18:39,223][01485] Num frames 3800... [2025-01-10 11:18:39,344][01485] Num frames 3900... [2025-01-10 11:18:39,466][01485] Num frames 4000... [2025-01-10 11:18:39,588][01485] Num frames 4100... [2025-01-10 11:18:39,733][01485] Avg episode rewards: #0: 23.695, true rewards: #0: 10.445 [2025-01-10 11:18:39,735][01485] Avg episode reward: 23.695, avg true_objective: 10.445 [2025-01-10 11:18:39,763][01485] Num frames 4200... [2025-01-10 11:18:39,886][01485] Num frames 4300... [2025-01-10 11:18:40,008][01485] Num frames 4400... [2025-01-10 11:18:40,144][01485] Num frames 4500... [2025-01-10 11:18:40,265][01485] Num frames 4600... [2025-01-10 11:18:40,384][01485] Num frames 4700... [2025-01-10 11:18:40,506][01485] Num frames 4800... [2025-01-10 11:18:40,624][01485] Num frames 4900... [2025-01-10 11:18:40,792][01485] Num frames 5000... [2025-01-10 11:18:40,960][01485] Num frames 5100... [2025-01-10 11:18:41,130][01485] Num frames 5200... [2025-01-10 11:18:41,297][01485] Num frames 5300... [2025-01-10 11:18:41,417][01485] Avg episode rewards: #0: 24.678, true rewards: #0: 10.678 [2025-01-10 11:18:41,419][01485] Avg episode reward: 24.678, avg true_objective: 10.678 [2025-01-10 11:18:41,521][01485] Num frames 5400... [2025-01-10 11:18:41,685][01485] Num frames 5500... [2025-01-10 11:18:41,844][01485] Num frames 5600... [2025-01-10 11:18:42,009][01485] Num frames 5700... [2025-01-10 11:18:42,183][01485] Num frames 5800... [2025-01-10 11:18:42,357][01485] Num frames 5900... [2025-01-10 11:18:42,535][01485] Num frames 6000... [2025-01-10 11:18:42,707][01485] Num frames 6100... [2025-01-10 11:18:42,827][01485] Avg episode rewards: #0: 23.727, true rewards: #0: 10.227 [2025-01-10 11:18:42,829][01485] Avg episode reward: 23.727, avg true_objective: 10.227 [2025-01-10 11:18:42,937][01485] Num frames 6200... [2025-01-10 11:18:43,095][01485] Num frames 6300... [2025-01-10 11:18:43,216][01485] Num frames 6400... [2025-01-10 11:18:43,341][01485] Num frames 6500... [2025-01-10 11:18:43,460][01485] Num frames 6600... [2025-01-10 11:18:43,579][01485] Num frames 6700... [2025-01-10 11:18:43,702][01485] Num frames 6800... [2025-01-10 11:18:43,826][01485] Num frames 6900... [2025-01-10 11:18:43,955][01485] Num frames 7000... [2025-01-10 11:18:44,007][01485] Avg episode rewards: #0: 23.143, true rewards: #0: 10.000 [2025-01-10 11:18:44,009][01485] Avg episode reward: 23.143, avg true_objective: 10.000 [2025-01-10 11:18:44,137][01485] Num frames 7100... [2025-01-10 11:18:44,270][01485] Num frames 7200... [2025-01-10 11:18:44,392][01485] Num frames 7300... [2025-01-10 11:18:44,518][01485] Avg episode rewards: #0: 21.321, true rewards: #0: 9.196 [2025-01-10 11:18:44,519][01485] Avg episode reward: 21.321, avg true_objective: 9.196 [2025-01-10 11:18:44,574][01485] Num frames 7400... [2025-01-10 11:18:44,696][01485] Num frames 7500... [2025-01-10 11:18:44,816][01485] Num frames 7600... [2025-01-10 11:18:44,937][01485] Num frames 7700... [2025-01-10 11:18:45,061][01485] Num frames 7800... [2025-01-10 11:18:45,193][01485] Num frames 7900... [2025-01-10 11:18:45,322][01485] Num frames 8000... [2025-01-10 11:18:45,445][01485] Num frames 8100... [2025-01-10 11:18:45,570][01485] Num frames 8200... [2025-01-10 11:18:45,689][01485] Num frames 8300... [2025-01-10 11:18:45,811][01485] Num frames 8400... [2025-01-10 11:18:45,931][01485] Num frames 8500... [2025-01-10 11:18:46,054][01485] Avg episode rewards: #0: 22.059, true rewards: #0: 9.503 [2025-01-10 11:18:46,056][01485] Avg episode reward: 22.059, avg true_objective: 9.503 [2025-01-10 11:18:46,122][01485] Num frames 8600... [2025-01-10 11:18:46,243][01485] Num frames 8700... [2025-01-10 11:18:46,372][01485] Num frames 8800... [2025-01-10 11:18:46,493][01485] Num frames 8900... [2025-01-10 11:18:46,618][01485] Num frames 9000... [2025-01-10 11:18:46,740][01485] Num frames 9100... [2025-01-10 11:18:46,860][01485] Num frames 9200... [2025-01-10 11:18:46,985][01485] Num frames 9300... [2025-01-10 11:18:47,113][01485] Num frames 9400... [2025-01-10 11:18:47,191][01485] Avg episode rewards: #0: 21.717, true rewards: #0: 9.417 [2025-01-10 11:18:47,193][01485] Avg episode reward: 21.717, avg true_objective: 9.417 [2025-01-10 11:19:43,044][01485] Replay video saved to /content/train_dir/default_experiment/replay.mp4!