gyaan's picture
Upload folder using huggingface_hub
c6b891b verified
[2025-02-14 07:22:25,511][00436] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-14 07:22:25,513][00436] Rollout worker 0 uses device cpu
[2025-02-14 07:22:25,515][00436] Rollout worker 1 uses device cpu
[2025-02-14 07:22:25,517][00436] Rollout worker 2 uses device cpu
[2025-02-14 07:22:25,518][00436] Rollout worker 3 uses device cpu
[2025-02-14 07:22:25,519][00436] Rollout worker 4 uses device cpu
[2025-02-14 07:22:25,520][00436] Rollout worker 5 uses device cpu
[2025-02-14 07:22:25,521][00436] Rollout worker 6 uses device cpu
[2025-02-14 07:22:25,522][00436] Rollout worker 7 uses device cpu
[2025-02-14 07:22:25,670][00436] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:22:25,672][00436] InferenceWorker_p0-w0: min num requests: 2
[2025-02-14 07:22:25,705][00436] Starting all processes...
[2025-02-14 07:22:25,706][00436] Starting process learner_proc0
[2025-02-14 07:22:25,763][00436] Starting all processes...
[2025-02-14 07:22:25,777][00436] Starting process inference_proc0-0
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc0
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc1
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc2
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc3
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc4
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc5
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc6
[2025-02-14 07:22:25,777][00436] Starting process rollout_proc7
[2025-02-14 07:22:42,193][04608] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:22:42,196][04608] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-14 07:22:42,328][04622] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:22:42,329][04622] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-14 07:22:42,336][04608] Num visible devices: 1
[2025-02-14 07:22:42,359][04622] Num visible devices: 1
[2025-02-14 07:22:42,368][04608] Starting seed is not provided
[2025-02-14 07:22:42,368][04608] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:22:42,369][04608] Initializing actor-critic model on device cuda:0
[2025-02-14 07:22:42,370][04608] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:22:42,373][04608] RunningMeanStd input shape: (1,)
[2025-02-14 07:22:42,412][04608] ConvEncoder: input_channels=3
[2025-02-14 07:22:42,426][04627] Worker 5 uses CPU cores [1]
[2025-02-14 07:22:42,471][04624] Worker 2 uses CPU cores [0]
[2025-02-14 07:22:42,557][04629] Worker 7 uses CPU cores [1]
[2025-02-14 07:22:42,655][04623] Worker 1 uses CPU cores [1]
[2025-02-14 07:22:42,771][04625] Worker 3 uses CPU cores [1]
[2025-02-14 07:22:42,873][04626] Worker 4 uses CPU cores [0]
[2025-02-14 07:22:42,916][04621] Worker 0 uses CPU cores [0]
[2025-02-14 07:22:42,932][04608] Conv encoder output size: 512
[2025-02-14 07:22:42,932][04608] Policy head output size: 512
[2025-02-14 07:22:42,956][04628] Worker 6 uses CPU cores [0]
[2025-02-14 07:22:42,998][04608] Created Actor Critic model with architecture:
[2025-02-14 07:22:42,999][04608] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-14 07:22:43,249][04608] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-14 07:22:45,671][00436] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-14 07:22:45,679][00436] Heartbeat connected on RolloutWorker_w0
[2025-02-14 07:22:45,683][00436] Heartbeat connected on RolloutWorker_w1
[2025-02-14 07:22:45,686][00436] Heartbeat connected on RolloutWorker_w2
[2025-02-14 07:22:45,690][00436] Heartbeat connected on RolloutWorker_w3
[2025-02-14 07:22:45,693][00436] Heartbeat connected on RolloutWorker_w4
[2025-02-14 07:22:45,697][00436] Heartbeat connected on RolloutWorker_w5
[2025-02-14 07:22:45,700][00436] Heartbeat connected on RolloutWorker_w6
[2025-02-14 07:22:45,704][00436] Heartbeat connected on RolloutWorker_w7
[2025-02-14 07:22:45,973][00436] Heartbeat connected on Batcher_0
[2025-02-14 07:22:47,317][04608] No checkpoints found
[2025-02-14 07:22:47,317][04608] Did not load from checkpoint, starting from scratch!
[2025-02-14 07:22:47,318][04608] Initialized policy 0 weights for model version 0
[2025-02-14 07:22:47,322][04608] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:22:47,330][04608] LearnerWorker_p0 finished initialization!
[2025-02-14 07:22:47,331][00436] Heartbeat connected on LearnerWorker_p0
[2025-02-14 07:22:47,503][04622] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:22:47,504][04622] RunningMeanStd input shape: (1,)
[2025-02-14 07:22:47,515][04622] ConvEncoder: input_channels=3
[2025-02-14 07:22:47,617][04622] Conv encoder output size: 512
[2025-02-14 07:22:47,617][04622] Policy head output size: 512
[2025-02-14 07:22:47,652][00436] Inference worker 0-0 is ready!
[2025-02-14 07:22:47,654][00436] All inference workers are ready! Signal rollout workers to start!
[2025-02-14 07:22:47,944][04624] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:47,954][04625] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:47,976][04628] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:47,977][04627] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:47,982][04629] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:47,980][04621] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:48,031][04623] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:48,115][04626] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:22:49,108][04629] Decorrelating experience for 0 frames...
[2025-02-14 07:22:49,176][04621] Decorrelating experience for 0 frames...
[2025-02-14 07:22:49,175][04628] Decorrelating experience for 0 frames...
[2025-02-14 07:22:49,178][04624] Decorrelating experience for 0 frames...
[2025-02-14 07:22:49,240][00436] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-14 07:22:49,583][04627] Decorrelating experience for 0 frames...
[2025-02-14 07:22:50,249][04624] Decorrelating experience for 32 frames...
[2025-02-14 07:22:50,251][04621] Decorrelating experience for 32 frames...
[2025-02-14 07:22:50,258][04628] Decorrelating experience for 32 frames...
[2025-02-14 07:22:50,268][04626] Decorrelating experience for 0 frames...
[2025-02-14 07:22:50,678][04629] Decorrelating experience for 32 frames...
[2025-02-14 07:22:51,497][04627] Decorrelating experience for 32 frames...
[2025-02-14 07:22:51,530][04626] Decorrelating experience for 32 frames...
[2025-02-14 07:22:51,821][04629] Decorrelating experience for 64 frames...
[2025-02-14 07:22:51,852][04621] Decorrelating experience for 64 frames...
[2025-02-14 07:22:51,854][04628] Decorrelating experience for 64 frames...
[2025-02-14 07:22:52,490][04623] Decorrelating experience for 0 frames...
[2025-02-14 07:22:52,693][04624] Decorrelating experience for 64 frames...
[2025-02-14 07:22:52,715][04627] Decorrelating experience for 64 frames...
[2025-02-14 07:22:53,247][04623] Decorrelating experience for 32 frames...
[2025-02-14 07:22:53,911][04628] Decorrelating experience for 96 frames...
[2025-02-14 07:22:54,152][04621] Decorrelating experience for 96 frames...
[2025-02-14 07:22:54,240][00436] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-14 07:22:54,306][04624] Decorrelating experience for 96 frames...
[2025-02-14 07:22:54,942][04623] Decorrelating experience for 64 frames...
[2025-02-14 07:22:55,369][04627] Decorrelating experience for 96 frames...
[2025-02-14 07:22:55,639][04626] Decorrelating experience for 64 frames...
[2025-02-14 07:22:56,649][04623] Decorrelating experience for 96 frames...
[2025-02-14 07:22:57,544][04629] Decorrelating experience for 96 frames...
[2025-02-14 07:22:59,080][04626] Decorrelating experience for 96 frames...
[2025-02-14 07:22:59,227][04608] Signal inference workers to stop experience collection...
[2025-02-14 07:22:59,238][04622] InferenceWorker_p0-w0: stopping experience collection
[2025-02-14 07:22:59,240][00436] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 38.2. Samples: 382. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-14 07:22:59,247][00436] Avg episode reward: [(0, '2.646')]
[2025-02-14 07:23:01,301][04608] Signal inference workers to resume experience collection...
[2025-02-14 07:23:01,302][04622] InferenceWorker_p0-w0: resuming experience collection
[2025-02-14 07:23:04,240][00436] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 243.2. Samples: 3648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:23:04,242][00436] Avg episode reward: [(0, '3.394')]
[2025-02-14 07:23:09,240][00436] Fps is (10 sec: 3686.4, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 479.7. Samples: 9594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:23:09,244][00436] Avg episode reward: [(0, '3.836')]
[2025-02-14 07:23:10,469][04622] Updated weights for policy 0, policy_version 10 (0.0013)
[2025-02-14 07:23:14,240][00436] Fps is (10 sec: 3686.4, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 458.2. Samples: 11456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:23:14,248][00436] Avg episode reward: [(0, '4.286')]
[2025-02-14 07:23:19,240][00436] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 73728. Throughput: 0: 591.1. Samples: 17734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:23:19,245][00436] Avg episode reward: [(0, '4.245')]
[2025-02-14 07:23:20,481][04622] Updated weights for policy 0, policy_version 20 (0.0021)
[2025-02-14 07:23:24,241][00436] Fps is (10 sec: 3686.0, 60 sec: 2574.5, 300 sec: 2574.5). Total num frames: 90112. Throughput: 0: 670.6. Samples: 23472. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:23:24,244][00436] Avg episode reward: [(0, '4.288')]
[2025-02-14 07:23:29,240][00436] Fps is (10 sec: 3686.4, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 636.9. Samples: 25476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:23:29,242][00436] Avg episode reward: [(0, '4.367')]
[2025-02-14 07:23:29,252][04608] Saving new best policy, reward=4.367!
[2025-02-14 07:23:31,799][04622] Updated weights for policy 0, policy_version 30 (0.0018)
[2025-02-14 07:23:34,240][00436] Fps is (10 sec: 4096.5, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 714.4. Samples: 32150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:23:34,245][00436] Avg episode reward: [(0, '4.274')]
[2025-02-14 07:23:39,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 151552. Throughput: 0: 848.7. Samples: 38192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:23:39,243][00436] Avg episode reward: [(0, '4.377')]
[2025-02-14 07:23:39,249][04608] Saving new best policy, reward=4.377!
[2025-02-14 07:23:42,735][04622] Updated weights for policy 0, policy_version 40 (0.0017)
[2025-02-14 07:23:44,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 886.0. Samples: 40252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:23:44,242][00436] Avg episode reward: [(0, '4.520')]
[2025-02-14 07:23:44,249][04608] Saving new best policy, reward=4.520!
[2025-02-14 07:23:49,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 192512. Throughput: 0: 963.3. Samples: 46996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:23:49,244][00436] Avg episode reward: [(0, '4.367')]
[2025-02-14 07:23:51,748][04622] Updated weights for policy 0, policy_version 50 (0.0012)
[2025-02-14 07:23:54,241][00436] Fps is (10 sec: 4095.7, 60 sec: 3481.5, 300 sec: 3213.7). Total num frames: 208896. Throughput: 0: 963.8. Samples: 52966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:23:54,244][00436] Avg episode reward: [(0, '4.445')]
[2025-02-14 07:23:59,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 975.2. Samples: 55340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:23:59,242][00436] Avg episode reward: [(0, '4.501')]
[2025-02-14 07:24:02,566][04622] Updated weights for policy 0, policy_version 60 (0.0015)
[2025-02-14 07:24:04,240][00436] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 986.3. Samples: 62116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:24:04,243][00436] Avg episode reward: [(0, '4.277')]
[2025-02-14 07:24:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3379.2). Total num frames: 270336. Throughput: 0: 987.7. Samples: 67916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:24:09,247][00436] Avg episode reward: [(0, '4.343')]
[2025-02-14 07:24:13,251][04622] Updated weights for policy 0, policy_version 70 (0.0015)
[2025-02-14 07:24:14,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3421.4). Total num frames: 290816. Throughput: 0: 995.9. Samples: 70290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:24:14,244][00436] Avg episode reward: [(0, '4.337')]
[2025-02-14 07:24:19,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3458.8). Total num frames: 311296. Throughput: 0: 1001.8. Samples: 77232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:24:19,246][00436] Avg episode reward: [(0, '4.387')]
[2025-02-14 07:24:19,254][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth...
[2025-02-14 07:24:23,200][04622] Updated weights for policy 0, policy_version 80 (0.0012)
[2025-02-14 07:24:24,242][00436] Fps is (10 sec: 3685.8, 60 sec: 3959.4, 300 sec: 3449.2). Total num frames: 327680. Throughput: 0: 987.6. Samples: 82636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:24:24,244][00436] Avg episode reward: [(0, '4.382')]
[2025-02-14 07:24:29,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3481.6). Total num frames: 348160. Throughput: 0: 998.9. Samples: 85204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:24:29,242][00436] Avg episode reward: [(0, '4.256')]
[2025-02-14 07:24:33,177][04622] Updated weights for policy 0, policy_version 90 (0.0024)
[2025-02-14 07:24:34,240][00436] Fps is (10 sec: 4506.4, 60 sec: 4027.7, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 1000.8. Samples: 92034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:24:34,242][00436] Avg episode reward: [(0, '4.277')]
[2025-02-14 07:24:39,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3537.5). Total num frames: 389120. Throughput: 0: 989.5. Samples: 97492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:24:39,243][00436] Avg episode reward: [(0, '4.448')]
[2025-02-14 07:24:43,860][04622] Updated weights for policy 0, policy_version 100 (0.0023)
[2025-02-14 07:24:44,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3561.7). Total num frames: 409600. Throughput: 0: 1000.4. Samples: 100360. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:24:44,245][00436] Avg episode reward: [(0, '4.594')]
[2025-02-14 07:24:44,247][04608] Saving new best policy, reward=4.594!
[2025-02-14 07:24:49,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3584.0). Total num frames: 430080. Throughput: 0: 997.3. Samples: 106996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:24:49,246][00436] Avg episode reward: [(0, '4.487')]
[2025-02-14 07:24:54,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3571.7). Total num frames: 446464. Throughput: 0: 985.6. Samples: 112270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:24:54,242][00436] Avg episode reward: [(0, '4.411')]
[2025-02-14 07:24:54,735][04622] Updated weights for policy 0, policy_version 110 (0.0012)
[2025-02-14 07:24:59,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3591.9). Total num frames: 466944. Throughput: 0: 999.7. Samples: 115278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:24:59,246][00436] Avg episode reward: [(0, '4.199')]
[2025-02-14 07:25:03,690][04622] Updated weights for policy 0, policy_version 120 (0.0019)
[2025-02-14 07:25:04,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3640.9). Total num frames: 491520. Throughput: 0: 996.9. Samples: 122092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:04,246][00436] Avg episode reward: [(0, '4.182')]
[2025-02-14 07:25:09,242][00436] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3627.8). Total num frames: 507904. Throughput: 0: 988.2. Samples: 127106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:25:09,244][00436] Avg episode reward: [(0, '4.300')]
[2025-02-14 07:25:14,240][00436] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3644.0). Total num frames: 528384. Throughput: 0: 1002.4. Samples: 130310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:14,243][00436] Avg episode reward: [(0, '4.578')]
[2025-02-14 07:25:14,599][04622] Updated weights for policy 0, policy_version 130 (0.0013)
[2025-02-14 07:25:19,241][00436] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 552960. Throughput: 0: 1000.6. Samples: 137060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:19,246][00436] Avg episode reward: [(0, '4.645')]
[2025-02-14 07:25:19,251][04608] Saving new best policy, reward=4.645!
[2025-02-14 07:25:24,240][00436] Fps is (10 sec: 3686.5, 60 sec: 3959.6, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 986.7. Samples: 141892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:25:24,244][00436] Avg episode reward: [(0, '4.720')]
[2025-02-14 07:25:24,249][04608] Saving new best policy, reward=4.720!
[2025-02-14 07:25:25,492][04622] Updated weights for policy 0, policy_version 140 (0.0027)
[2025-02-14 07:25:29,240][00436] Fps is (10 sec: 3686.7, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 589824. Throughput: 0: 997.2. Samples: 145232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:29,243][00436] Avg episode reward: [(0, '4.658')]
[2025-02-14 07:25:34,244][00436] Fps is (10 sec: 4504.0, 60 sec: 3959.2, 300 sec: 3698.7). Total num frames: 610304. Throughput: 0: 1001.8. Samples: 152080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:25:34,246][00436] Avg episode reward: [(0, '4.672')]
[2025-02-14 07:25:34,750][04622] Updated weights for policy 0, policy_version 150 (0.0025)
[2025-02-14 07:25:39,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 626688. Throughput: 0: 993.2. Samples: 156964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:39,246][00436] Avg episode reward: [(0, '4.699')]
[2025-02-14 07:25:44,240][00436] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3698.1). Total num frames: 647168. Throughput: 0: 1002.0. Samples: 160368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:44,246][00436] Avg episode reward: [(0, '4.515')]
[2025-02-14 07:25:45,218][04622] Updated weights for policy 0, policy_version 160 (0.0016)
[2025-02-14 07:25:49,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3731.9). Total num frames: 671744. Throughput: 0: 999.6. Samples: 167076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:49,249][00436] Avg episode reward: [(0, '4.365')]
[2025-02-14 07:25:54,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3719.6). Total num frames: 688128. Throughput: 0: 997.6. Samples: 171998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:25:54,245][00436] Avg episode reward: [(0, '4.361')]
[2025-02-14 07:25:55,918][04622] Updated weights for policy 0, policy_version 170 (0.0014)
[2025-02-14 07:25:59,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3729.5). Total num frames: 708608. Throughput: 0: 1003.0. Samples: 175446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:25:59,243][00436] Avg episode reward: [(0, '4.560')]
[2025-02-14 07:26:04,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3738.9). Total num frames: 729088. Throughput: 0: 1004.5. Samples: 182262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:04,249][00436] Avg episode reward: [(0, '4.709')]
[2025-02-14 07:26:06,132][04622] Updated weights for policy 0, policy_version 180 (0.0019)
[2025-02-14 07:26:09,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3727.4). Total num frames: 745472. Throughput: 0: 1005.4. Samples: 187136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:09,244][00436] Avg episode reward: [(0, '4.697')]
[2025-02-14 07:26:14,240][00436] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3756.3). Total num frames: 770048. Throughput: 0: 1007.0. Samples: 190548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:14,243][00436] Avg episode reward: [(0, '4.887')]
[2025-02-14 07:26:14,246][04608] Saving new best policy, reward=4.887!
[2025-02-14 07:26:15,649][04622] Updated weights for policy 0, policy_version 190 (0.0013)
[2025-02-14 07:26:19,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 998.9. Samples: 197026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:26:19,244][00436] Avg episode reward: [(0, '4.872')]
[2025-02-14 07:26:19,255][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth...
[2025-02-14 07:26:24,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3753.1). Total num frames: 806912. Throughput: 0: 998.2. Samples: 201882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:26:24,246][00436] Avg episode reward: [(0, '4.842')]
[2025-02-14 07:26:26,561][04622] Updated weights for policy 0, policy_version 200 (0.0018)
[2025-02-14 07:26:29,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3779.5). Total num frames: 831488. Throughput: 0: 999.8. Samples: 205358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:29,242][00436] Avg episode reward: [(0, '4.864')]
[2025-02-14 07:26:34,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3768.3). Total num frames: 847872. Throughput: 0: 996.6. Samples: 211922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:34,246][00436] Avg episode reward: [(0, '4.791')]
[2025-02-14 07:26:37,364][04622] Updated weights for policy 0, policy_version 210 (0.0014)
[2025-02-14 07:26:39,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3775.4). Total num frames: 868352. Throughput: 0: 1002.2. Samples: 217098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:39,242][00436] Avg episode reward: [(0, '4.939')]
[2025-02-14 07:26:39,247][04608] Saving new best policy, reward=4.939!
[2025-02-14 07:26:44,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3782.3). Total num frames: 888832. Throughput: 0: 1001.7. Samples: 220524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:44,243][00436] Avg episode reward: [(0, '5.081')]
[2025-02-14 07:26:44,245][04608] Saving new best policy, reward=5.081!
[2025-02-14 07:26:46,331][04622] Updated weights for policy 0, policy_version 220 (0.0020)
[2025-02-14 07:26:49,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3771.7). Total num frames: 905216. Throughput: 0: 987.6. Samples: 226706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:49,252][00436] Avg episode reward: [(0, '5.290')]
[2025-02-14 07:26:49,320][04608] Saving new best policy, reward=5.290!
[2025-02-14 07:26:54,242][00436] Fps is (10 sec: 3685.8, 60 sec: 3959.4, 300 sec: 3778.3). Total num frames: 925696. Throughput: 0: 995.3. Samples: 231926. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:26:54,244][00436] Avg episode reward: [(0, '5.280')]
[2025-02-14 07:26:57,160][04622] Updated weights for policy 0, policy_version 230 (0.0017)
[2025-02-14 07:26:59,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3801.1). Total num frames: 950272. Throughput: 0: 995.2. Samples: 235330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:26:59,242][00436] Avg episode reward: [(0, '5.437')]
[2025-02-14 07:26:59,253][04608] Saving new best policy, reward=5.437!
[2025-02-14 07:27:04,240][00436] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3790.8). Total num frames: 966656. Throughput: 0: 987.3. Samples: 241456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:27:04,244][00436] Avg episode reward: [(0, '5.557')]
[2025-02-14 07:27:04,249][04608] Saving new best policy, reward=5.557!
[2025-02-14 07:27:08,109][04622] Updated weights for policy 0, policy_version 240 (0.0014)
[2025-02-14 07:27:09,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3796.7). Total num frames: 987136. Throughput: 0: 1000.1. Samples: 246888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:27:09,242][00436] Avg episode reward: [(0, '5.509')]
[2025-02-14 07:27:14,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3802.3). Total num frames: 1007616. Throughput: 0: 997.3. Samples: 250238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:27:14,242][00436] Avg episode reward: [(0, '5.790')]
[2025-02-14 07:27:14,245][04608] Saving new best policy, reward=5.790!
[2025-02-14 07:27:18,724][04622] Updated weights for policy 0, policy_version 250 (0.0018)
[2025-02-14 07:27:19,241][00436] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3792.6). Total num frames: 1024000. Throughput: 0: 975.1. Samples: 255804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:27:19,249][00436] Avg episode reward: [(0, '6.032')]
[2025-02-14 07:27:19,258][04608] Saving new best policy, reward=6.032!
[2025-02-14 07:27:24,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3798.1). Total num frames: 1044480. Throughput: 0: 980.8. Samples: 261234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:27:24,249][00436] Avg episode reward: [(0, '6.231')]
[2025-02-14 07:27:24,253][04608] Saving new best policy, reward=6.231!
[2025-02-14 07:27:28,638][04622] Updated weights for policy 0, policy_version 260 (0.0035)
[2025-02-14 07:27:29,242][00436] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3803.4). Total num frames: 1064960. Throughput: 0: 978.8. Samples: 264572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:27:29,245][00436] Avg episode reward: [(0, '6.061')]
[2025-02-14 07:27:34,240][00436] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3794.2). Total num frames: 1081344. Throughput: 0: 973.4. Samples: 270508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:27:34,242][00436] Avg episode reward: [(0, '6.566')]
[2025-02-14 07:27:34,248][04608] Saving new best policy, reward=6.566!
[2025-02-14 07:27:39,168][04622] Updated weights for policy 0, policy_version 270 (0.0016)
[2025-02-14 07:27:39,240][00436] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3813.5). Total num frames: 1105920. Throughput: 0: 986.1. Samples: 276300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:27:39,243][00436] Avg episode reward: [(0, '6.791')]
[2025-02-14 07:27:39,253][04608] Saving new best policy, reward=6.791!
[2025-02-14 07:27:44,240][00436] Fps is (10 sec: 4505.6, 60 sec: 3959.4, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 985.7. Samples: 279688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:27:44,246][00436] Avg episode reward: [(0, '7.181')]
[2025-02-14 07:27:44,249][04608] Saving new best policy, reward=7.181!
[2025-02-14 07:27:49,247][00436] Fps is (10 sec: 3683.8, 60 sec: 3959.0, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 971.6. Samples: 285186. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:27:49,252][00436] Avg episode reward: [(0, '7.295')]
[2025-02-14 07:27:49,259][04608] Saving new best policy, reward=7.295!
[2025-02-14 07:27:50,495][04622] Updated weights for policy 0, policy_version 280 (0.0024)
[2025-02-14 07:27:54,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3943.3). Total num frames: 1163264. Throughput: 0: 982.0. Samples: 291078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:27:54,249][00436] Avg episode reward: [(0, '7.602')]
[2025-02-14 07:27:54,251][04608] Saving new best policy, reward=7.602!
[2025-02-14 07:27:59,240][00436] Fps is (10 sec: 4098.9, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1183744. Throughput: 0: 981.1. Samples: 294388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:27:59,247][00436] Avg episode reward: [(0, '6.568')]
[2025-02-14 07:27:59,274][04622] Updated weights for policy 0, policy_version 290 (0.0014)
[2025-02-14 07:28:04,240][00436] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1200128. Throughput: 0: 979.4. Samples: 299878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:28:04,242][00436] Avg episode reward: [(0, '6.722')]
[2025-02-14 07:28:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1224704. Throughput: 0: 998.8. Samples: 306182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:28:09,243][00436] Avg episode reward: [(0, '6.874')]
[2025-02-14 07:28:10,147][04622] Updated weights for policy 0, policy_version 300 (0.0025)
[2025-02-14 07:28:14,240][00436] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1245184. Throughput: 0: 1000.7. Samples: 309602. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:28:14,245][00436] Avg episode reward: [(0, '7.821')]
[2025-02-14 07:28:14,251][04608] Saving new best policy, reward=7.821!
[2025-02-14 07:28:19,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 1261568. Throughput: 0: 984.1. Samples: 314794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:28:19,247][00436] Avg episode reward: [(0, '8.107')]
[2025-02-14 07:28:19,255][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth...
[2025-02-14 07:28:19,373][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth
[2025-02-14 07:28:19,381][04608] Saving new best policy, reward=8.107!
[2025-02-14 07:28:20,915][04622] Updated weights for policy 0, policy_version 310 (0.0021)
[2025-02-14 07:28:24,240][00436] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1282048. Throughput: 0: 993.7. Samples: 321016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:28:24,242][00436] Avg episode reward: [(0, '8.278')]
[2025-02-14 07:28:24,248][04608] Saving new best policy, reward=8.278!
[2025-02-14 07:28:29,242][00436] Fps is (10 sec: 4504.9, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1306624. Throughput: 0: 993.1. Samples: 324378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:28:29,245][00436] Avg episode reward: [(0, '8.424')]
[2025-02-14 07:28:29,253][04608] Saving new best policy, reward=8.424!
[2025-02-14 07:28:30,726][04622] Updated weights for policy 0, policy_version 320 (0.0018)
[2025-02-14 07:28:34,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1318912. Throughput: 0: 983.1. Samples: 329420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:28:34,242][00436] Avg episode reward: [(0, '8.857')]
[2025-02-14 07:28:34,250][04608] Saving new best policy, reward=8.857!
[2025-02-14 07:28:39,241][00436] Fps is (10 sec: 3686.7, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 1343488. Throughput: 0: 999.2. Samples: 336042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:28:39,243][00436] Avg episode reward: [(0, '9.158')]
[2025-02-14 07:28:39,253][04608] Saving new best policy, reward=9.158!
[2025-02-14 07:28:40,854][04622] Updated weights for policy 0, policy_version 330 (0.0023)
[2025-02-14 07:28:44,241][00436] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1363968. Throughput: 0: 1001.1. Samples: 339438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:28:44,245][00436] Avg episode reward: [(0, '9.227')]
[2025-02-14 07:28:44,248][04608] Saving new best policy, reward=9.227!
[2025-02-14 07:28:49,240][00436] Fps is (10 sec: 3686.7, 60 sec: 3959.9, 300 sec: 3971.0). Total num frames: 1380352. Throughput: 0: 986.3. Samples: 344260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:28:49,242][00436] Avg episode reward: [(0, '9.319')]
[2025-02-14 07:28:49,250][04608] Saving new best policy, reward=9.319!
[2025-02-14 07:28:51,684][04622] Updated weights for policy 0, policy_version 340 (0.0014)
[2025-02-14 07:28:54,240][00436] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1400832. Throughput: 0: 994.0. Samples: 350914. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:28:54,246][00436] Avg episode reward: [(0, '9.927')]
[2025-02-14 07:28:54,248][04608] Saving new best policy, reward=9.927!
[2025-02-14 07:28:59,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1421312. Throughput: 0: 991.6. Samples: 354222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:28:59,242][00436] Avg episode reward: [(0, '10.006')]
[2025-02-14 07:28:59,261][04608] Saving new best policy, reward=10.006!
[2025-02-14 07:29:02,746][04622] Updated weights for policy 0, policy_version 350 (0.0013)
[2025-02-14 07:29:04,240][00436] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 1437696. Throughput: 0: 981.3. Samples: 358954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:29:04,246][00436] Avg episode reward: [(0, '9.969')]
[2025-02-14 07:29:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1462272. Throughput: 0: 996.9. Samples: 365876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:29:09,244][00436] Avg episode reward: [(0, '10.037')]
[2025-02-14 07:29:09,251][04608] Saving new best policy, reward=10.037!
[2025-02-14 07:29:11,475][04622] Updated weights for policy 0, policy_version 360 (0.0019)
[2025-02-14 07:29:14,241][00436] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1482752. Throughput: 0: 997.3. Samples: 369256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:29:14,245][00436] Avg episode reward: [(0, '10.848')]
[2025-02-14 07:29:14,250][04608] Saving new best policy, reward=10.848!
[2025-02-14 07:29:19,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 1499136. Throughput: 0: 987.4. Samples: 373852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:29:19,243][00436] Avg episode reward: [(0, '11.494')]
[2025-02-14 07:29:19,250][04608] Saving new best policy, reward=11.494!
[2025-02-14 07:29:22,503][04622] Updated weights for policy 0, policy_version 370 (0.0016)
[2025-02-14 07:29:24,240][00436] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1523712. Throughput: 0: 991.9. Samples: 380676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:29:24,248][00436] Avg episode reward: [(0, '11.714')]
[2025-02-14 07:29:24,250][04608] Saving new best policy, reward=11.714!
[2025-02-14 07:29:29,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 1540096. Throughput: 0: 992.4. Samples: 384094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:29:29,243][00436] Avg episode reward: [(0, '11.695')]
[2025-02-14 07:29:33,326][04622] Updated weights for policy 0, policy_version 380 (0.0012)
[2025-02-14 07:29:34,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1560576. Throughput: 0: 990.9. Samples: 388852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:29:34,246][00436] Avg episode reward: [(0, '11.970')]
[2025-02-14 07:29:34,248][04608] Saving new best policy, reward=11.970!
[2025-02-14 07:29:39,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1581056. Throughput: 0: 996.3. Samples: 395746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:29:39,247][00436] Avg episode reward: [(0, '12.113')]
[2025-02-14 07:29:39,259][04608] Saving new best policy, reward=12.113!
[2025-02-14 07:29:42,495][04622] Updated weights for policy 0, policy_version 390 (0.0012)
[2025-02-14 07:29:44,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1601536. Throughput: 0: 999.4. Samples: 399196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:29:44,247][00436] Avg episode reward: [(0, '12.308')]
[2025-02-14 07:29:44,255][04608] Saving new best policy, reward=12.308!
[2025-02-14 07:29:49,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1617920. Throughput: 0: 997.6. Samples: 403848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:29:49,246][00436] Avg episode reward: [(0, '12.246')]
[2025-02-14 07:29:53,041][04622] Updated weights for policy 0, policy_version 400 (0.0019)
[2025-02-14 07:29:54,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1642496. Throughput: 0: 997.4. Samples: 410760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:29:54,246][00436] Avg episode reward: [(0, '11.491')]
[2025-02-14 07:29:59,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1658880. Throughput: 0: 996.5. Samples: 414098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:29:59,245][00436] Avg episode reward: [(0, '12.670')]
[2025-02-14 07:29:59,257][04608] Saving new best policy, reward=12.670!
[2025-02-14 07:30:03,836][04622] Updated weights for policy 0, policy_version 410 (0.0023)
[2025-02-14 07:30:04,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.1). Total num frames: 1679360. Throughput: 0: 1003.1. Samples: 418992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:30:04,248][00436] Avg episode reward: [(0, '14.654')]
[2025-02-14 07:30:04,251][04608] Saving new best policy, reward=14.654!
[2025-02-14 07:30:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1699840. Throughput: 0: 999.4. Samples: 425650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:30:09,243][00436] Avg episode reward: [(0, '15.103')]
[2025-02-14 07:30:09,249][04608] Saving new best policy, reward=15.103!
[2025-02-14 07:30:14,215][04622] Updated weights for policy 0, policy_version 420 (0.0014)
[2025-02-14 07:30:14,240][00436] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1720320. Throughput: 0: 992.8. Samples: 428772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:30:14,243][00436] Avg episode reward: [(0, '15.350')]
[2025-02-14 07:30:14,245][04608] Saving new best policy, reward=15.350!
[2025-02-14 07:30:19,240][00436] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1736704. Throughput: 0: 994.4. Samples: 433602. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:30:19,243][00436] Avg episode reward: [(0, '14.708')]
[2025-02-14 07:30:19,250][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth...
[2025-02-14 07:30:19,362][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth
[2025-02-14 07:30:23,791][04622] Updated weights for policy 0, policy_version 430 (0.0024)
[2025-02-14 07:30:24,240][00436] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1761280. Throughput: 0: 997.3. Samples: 440624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:30:24,242][00436] Avg episode reward: [(0, '14.736')]
[2025-02-14 07:30:29,240][00436] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1777664. Throughput: 0: 986.9. Samples: 443606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:30:29,247][00436] Avg episode reward: [(0, '16.641')]
[2025-02-14 07:30:29,256][04608] Saving new best policy, reward=16.641!
[2025-02-14 07:30:34,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1798144. Throughput: 0: 995.7. Samples: 448654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:30:34,246][00436] Avg episode reward: [(0, '16.194')]
[2025-02-14 07:30:34,803][04622] Updated weights for policy 0, policy_version 440 (0.0021)
[2025-02-14 07:30:39,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1822720. Throughput: 0: 995.7. Samples: 455568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:30:39,244][00436] Avg episode reward: [(0, '17.385')]
[2025-02-14 07:30:39,252][04608] Saving new best policy, reward=17.385!
[2025-02-14 07:30:44,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1839104. Throughput: 0: 985.5. Samples: 458446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:30:44,247][00436] Avg episode reward: [(0, '15.577')]
[2025-02-14 07:30:45,592][04622] Updated weights for policy 0, policy_version 450 (0.0025)
[2025-02-14 07:30:49,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1859584. Throughput: 0: 995.7. Samples: 463798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:30:49,244][00436] Avg episode reward: [(0, '14.558')]
[2025-02-14 07:30:54,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1880064. Throughput: 0: 1000.8. Samples: 470688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:30:54,247][00436] Avg episode reward: [(0, '15.108')]
[2025-02-14 07:30:54,380][04622] Updated weights for policy 0, policy_version 460 (0.0031)
[2025-02-14 07:30:59,240][00436] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 1896448. Throughput: 0: 995.6. Samples: 473576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:30:59,247][00436] Avg episode reward: [(0, '15.371')]
[2025-02-14 07:31:04,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1916928. Throughput: 0: 1008.5. Samples: 478986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:04,247][00436] Avg episode reward: [(0, '16.063')]
[2025-02-14 07:31:05,082][04622] Updated weights for policy 0, policy_version 470 (0.0019)
[2025-02-14 07:31:09,240][00436] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1941504. Throughput: 0: 1006.7. Samples: 485926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:09,242][00436] Avg episode reward: [(0, '17.060')]
[2025-02-14 07:31:14,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1957888. Throughput: 0: 1000.9. Samples: 488646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:31:14,246][00436] Avg episode reward: [(0, '17.102')]
[2025-02-14 07:31:15,818][04622] Updated weights for policy 0, policy_version 480 (0.0034)
[2025-02-14 07:31:19,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3971.0). Total num frames: 1978368. Throughput: 0: 1008.6. Samples: 494040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:31:19,247][00436] Avg episode reward: [(0, '16.683')]
[2025-02-14 07:31:24,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2002944. Throughput: 0: 1010.4. Samples: 501034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:24,245][00436] Avg episode reward: [(0, '18.929')]
[2025-02-14 07:31:24,248][04608] Saving new best policy, reward=18.929!
[2025-02-14 07:31:25,154][04622] Updated weights for policy 0, policy_version 490 (0.0015)
[2025-02-14 07:31:29,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2015232. Throughput: 0: 1000.4. Samples: 503466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:31:29,249][00436] Avg episode reward: [(0, '18.615')]
[2025-02-14 07:31:34,240][00436] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2039808. Throughput: 0: 1006.4. Samples: 509086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:34,247][00436] Avg episode reward: [(0, '19.650')]
[2025-02-14 07:31:34,251][04608] Saving new best policy, reward=19.650!
[2025-02-14 07:31:35,627][04622] Updated weights for policy 0, policy_version 500 (0.0013)
[2025-02-14 07:31:39,240][00436] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2064384. Throughput: 0: 1007.1. Samples: 516006. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:39,247][00436] Avg episode reward: [(0, '20.814')]
[2025-02-14 07:31:39,254][04608] Saving new best policy, reward=20.814!
[2025-02-14 07:31:44,240][00436] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2076672. Throughput: 0: 995.5. Samples: 518372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:44,245][00436] Avg episode reward: [(0, '21.065')]
[2025-02-14 07:31:44,247][04608] Saving new best policy, reward=21.065!
[2025-02-14 07:31:46,459][04622] Updated weights for policy 0, policy_version 510 (0.0016)
[2025-02-14 07:31:49,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2101248. Throughput: 0: 1001.6. Samples: 524056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:49,242][00436] Avg episode reward: [(0, '19.696')]
[2025-02-14 07:31:54,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2121728. Throughput: 0: 998.8. Samples: 530870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:31:54,244][00436] Avg episode reward: [(0, '21.346')]
[2025-02-14 07:31:54,250][04608] Saving new best policy, reward=21.346!
[2025-02-14 07:31:56,859][04622] Updated weights for policy 0, policy_version 520 (0.0018)
[2025-02-14 07:31:59,240][00436] Fps is (10 sec: 3686.5, 60 sec: 4027.8, 300 sec: 3971.0). Total num frames: 2138112. Throughput: 0: 984.2. Samples: 532934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:31:59,245][00436] Avg episode reward: [(0, '22.718')]
[2025-02-14 07:31:59,253][04608] Saving new best policy, reward=22.718!
[2025-02-14 07:32:04,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2158592. Throughput: 0: 997.2. Samples: 538914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:32:04,242][00436] Avg episode reward: [(0, '23.126')]
[2025-02-14 07:32:04,248][04608] Saving new best policy, reward=23.126!
[2025-02-14 07:32:06,464][04622] Updated weights for policy 0, policy_version 530 (0.0013)
[2025-02-14 07:32:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2179072. Throughput: 0: 991.0. Samples: 545630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:32:09,244][00436] Avg episode reward: [(0, '23.739')]
[2025-02-14 07:32:09,263][04608] Saving new best policy, reward=23.739!
[2025-02-14 07:32:14,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2195456. Throughput: 0: 980.5. Samples: 547588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:32:14,243][00436] Avg episode reward: [(0, '23.968')]
[2025-02-14 07:32:14,248][04608] Saving new best policy, reward=23.968!
[2025-02-14 07:32:17,319][04622] Updated weights for policy 0, policy_version 540 (0.0019)
[2025-02-14 07:32:19,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2220032. Throughput: 0: 993.9. Samples: 553812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:32:19,243][00436] Avg episode reward: [(0, '23.042')]
[2025-02-14 07:32:19,251][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth...
[2025-02-14 07:32:19,380][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth
[2025-02-14 07:32:24,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 2236416. Throughput: 0: 982.0. Samples: 560194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:32:24,242][00436] Avg episode reward: [(0, '21.758')]
[2025-02-14 07:32:28,368][04622] Updated weights for policy 0, policy_version 550 (0.0024)
[2025-02-14 07:32:29,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2256896. Throughput: 0: 974.9. Samples: 562244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:32:29,245][00436] Avg episode reward: [(0, '20.389')]
[2025-02-14 07:32:34,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2277376. Throughput: 0: 993.1. Samples: 568746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:32:34,242][00436] Avg episode reward: [(0, '19.443')]
[2025-02-14 07:32:37,116][04622] Updated weights for policy 0, policy_version 560 (0.0030)
[2025-02-14 07:32:39,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2297856. Throughput: 0: 986.1. Samples: 575244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:32:39,243][00436] Avg episode reward: [(0, '18.236')]
[2025-02-14 07:32:44,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2314240. Throughput: 0: 985.6. Samples: 577288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:32:44,242][00436] Avg episode reward: [(0, '18.383')]
[2025-02-14 07:32:47,956][04622] Updated weights for policy 0, policy_version 570 (0.0026)
[2025-02-14 07:32:49,240][00436] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2338816. Throughput: 0: 1003.3. Samples: 584064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:32:49,246][00436] Avg episode reward: [(0, '18.782')]
[2025-02-14 07:32:54,245][00436] Fps is (10 sec: 4503.6, 60 sec: 3959.2, 300 sec: 3984.9). Total num frames: 2359296. Throughput: 0: 987.7. Samples: 590080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:32:54,249][00436] Avg episode reward: [(0, '20.879')]
[2025-02-14 07:32:58,587][04622] Updated weights for policy 0, policy_version 580 (0.0034)
[2025-02-14 07:32:59,240][00436] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2375680. Throughput: 0: 991.2. Samples: 592190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:32:59,246][00436] Avg episode reward: [(0, '21.103')]
[2025-02-14 07:33:04,240][00436] Fps is (10 sec: 4097.8, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2400256. Throughput: 0: 1007.4. Samples: 599144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:33:04,243][00436] Avg episode reward: [(0, '21.663')]
[2025-02-14 07:33:07,910][04622] Updated weights for policy 0, policy_version 590 (0.0013)
[2025-02-14 07:33:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2416640. Throughput: 0: 998.9. Samples: 605144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:33:09,243][00436] Avg episode reward: [(0, '20.637')]
[2025-02-14 07:33:14,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2437120. Throughput: 0: 1006.1. Samples: 607518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:33:14,246][00436] Avg episode reward: [(0, '20.703')]
[2025-02-14 07:33:18,084][04622] Updated weights for policy 0, policy_version 600 (0.0017)
[2025-02-14 07:33:19,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2461696. Throughput: 0: 1012.5. Samples: 614310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:33:19,247][00436] Avg episode reward: [(0, '20.128')]
[2025-02-14 07:33:24,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.1). Total num frames: 2478080. Throughput: 0: 996.6. Samples: 620090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:33:24,242][00436] Avg episode reward: [(0, '20.834')]
[2025-02-14 07:33:28,681][04622] Updated weights for policy 0, policy_version 610 (0.0020)
[2025-02-14 07:33:29,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2498560. Throughput: 0: 1010.1. Samples: 622742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:33:29,247][00436] Avg episode reward: [(0, '22.299')]
[2025-02-14 07:33:34,240][00436] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2523136. Throughput: 0: 1013.0. Samples: 629650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:33:34,243][00436] Avg episode reward: [(0, '24.283')]
[2025-02-14 07:33:34,249][04608] Saving new best policy, reward=24.283!
[2025-02-14 07:33:38,669][04622] Updated weights for policy 0, policy_version 620 (0.0014)
[2025-02-14 07:33:39,242][00436] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 3984.9). Total num frames: 2539520. Throughput: 0: 1001.7. Samples: 635152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:33:39,249][00436] Avg episode reward: [(0, '23.661')]
[2025-02-14 07:33:44,240][00436] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2560000. Throughput: 0: 1018.9. Samples: 638040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:33:44,248][00436] Avg episode reward: [(0, '24.927')]
[2025-02-14 07:33:44,252][04608] Saving new best policy, reward=24.927!
[2025-02-14 07:33:48,538][04622] Updated weights for policy 0, policy_version 630 (0.0014)
[2025-02-14 07:33:49,240][00436] Fps is (10 sec: 4096.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2580480. Throughput: 0: 1011.2. Samples: 644650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:33:49,247][00436] Avg episode reward: [(0, '24.711')]
[2025-02-14 07:33:54,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.8, 300 sec: 3984.9). Total num frames: 2596864. Throughput: 0: 991.9. Samples: 649780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:33:54,246][00436] Avg episode reward: [(0, '24.723')]
[2025-02-14 07:33:59,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2617344. Throughput: 0: 1006.0. Samples: 652788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:33:59,243][00436] Avg episode reward: [(0, '24.583')]
[2025-02-14 07:33:59,514][04622] Updated weights for policy 0, policy_version 640 (0.0017)
[2025-02-14 07:34:04,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2641920. Throughput: 0: 1007.7. Samples: 659658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:34:04,242][00436] Avg episode reward: [(0, '24.606')]
[2025-02-14 07:34:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2658304. Throughput: 0: 992.4. Samples: 664750. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:34:09,241][00436] Avg episode reward: [(0, '24.136')]
[2025-02-14 07:34:09,943][04622] Updated weights for policy 0, policy_version 650 (0.0029)
[2025-02-14 07:34:14,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2678784. Throughput: 0: 1008.1. Samples: 668106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:34:14,243][00436] Avg episode reward: [(0, '25.078')]
[2025-02-14 07:34:14,245][04608] Saving new best policy, reward=25.078!
[2025-02-14 07:34:19,205][04622] Updated weights for policy 0, policy_version 660 (0.0017)
[2025-02-14 07:34:19,240][00436] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2703360. Throughput: 0: 1001.7. Samples: 674728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:34:19,252][00436] Avg episode reward: [(0, '25.160')]
[2025-02-14 07:34:19,263][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000660_2703360.pth...
[2025-02-14 07:34:19,432][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth
[2025-02-14 07:34:19,452][04608] Saving new best policy, reward=25.160!
[2025-02-14 07:34:24,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2715648. Throughput: 0: 984.7. Samples: 679462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-14 07:34:24,242][00436] Avg episode reward: [(0, '24.458')]
[2025-02-14 07:34:29,240][00436] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2736128. Throughput: 0: 993.7. Samples: 682756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:34:29,242][00436] Avg episode reward: [(0, '24.962')]
[2025-02-14 07:34:30,103][04622] Updated weights for policy 0, policy_version 670 (0.0014)
[2025-02-14 07:34:34,246][00436] Fps is (10 sec: 4503.1, 60 sec: 3959.1, 300 sec: 3998.7). Total num frames: 2760704. Throughput: 0: 1000.9. Samples: 689698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:34:34,247][00436] Avg episode reward: [(0, '24.211')]
[2025-02-14 07:34:39,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 2777088. Throughput: 0: 994.9. Samples: 694550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:34:39,243][00436] Avg episode reward: [(0, '23.678')]
[2025-02-14 07:34:40,699][04622] Updated weights for policy 0, policy_version 680 (0.0030)
[2025-02-14 07:34:44,240][00436] Fps is (10 sec: 3688.5, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2797568. Throughput: 0: 1004.8. Samples: 698004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:34:44,249][00436] Avg episode reward: [(0, '24.150')]
[2025-02-14 07:34:49,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2822144. Throughput: 0: 1007.3. Samples: 704988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:34:49,244][00436] Avg episode reward: [(0, '24.854')]
[2025-02-14 07:34:50,372][04622] Updated weights for policy 0, policy_version 690 (0.0036)
[2025-02-14 07:34:54,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2838528. Throughput: 0: 999.1. Samples: 709708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:34:54,245][00436] Avg episode reward: [(0, '25.648')]
[2025-02-14 07:34:54,248][04608] Saving new best policy, reward=25.648!
[2025-02-14 07:34:59,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2859008. Throughput: 0: 1000.0. Samples: 713104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:34:59,246][00436] Avg episode reward: [(0, '26.143')]
[2025-02-14 07:34:59,253][04608] Saving new best policy, reward=26.143!
[2025-02-14 07:35:00,336][04622] Updated weights for policy 0, policy_version 700 (0.0022)
[2025-02-14 07:35:04,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2879488. Throughput: 0: 1005.6. Samples: 719980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:35:04,242][00436] Avg episode reward: [(0, '26.911')]
[2025-02-14 07:35:04,248][04608] Saving new best policy, reward=26.911!
[2025-02-14 07:35:09,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2895872. Throughput: 0: 1004.2. Samples: 724652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:35:09,242][00436] Avg episode reward: [(0, '26.585')]
[2025-02-14 07:35:11,118][04622] Updated weights for policy 0, policy_version 710 (0.0035)
[2025-02-14 07:35:14,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2920448. Throughput: 0: 1008.4. Samples: 728134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:35:14,242][00436] Avg episode reward: [(0, '25.920')]
[2025-02-14 07:35:19,240][00436] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2940928. Throughput: 0: 1006.7. Samples: 734994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:35:19,245][00436] Avg episode reward: [(0, '25.187')]
[2025-02-14 07:35:21,616][04622] Updated weights for policy 0, policy_version 720 (0.0026)
[2025-02-14 07:35:24,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2957312. Throughput: 0: 1009.4. Samples: 739974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:35:24,245][00436] Avg episode reward: [(0, '25.072')]
[2025-02-14 07:35:29,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2981888. Throughput: 0: 1009.6. Samples: 743436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:35:29,248][00436] Avg episode reward: [(0, '22.444')]
[2025-02-14 07:35:30,650][04622] Updated weights for policy 0, policy_version 730 (0.0019)
[2025-02-14 07:35:34,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3998.8). Total num frames: 3002368. Throughput: 0: 999.8. Samples: 749980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:35:34,247][00436] Avg episode reward: [(0, '21.739')]
[2025-02-14 07:35:39,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3018752. Throughput: 0: 1007.5. Samples: 755046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:35:39,246][00436] Avg episode reward: [(0, '23.325')]
[2025-02-14 07:35:41,428][04622] Updated weights for policy 0, policy_version 740 (0.0024)
[2025-02-14 07:35:44,242][00436] Fps is (10 sec: 4095.1, 60 sec: 4095.8, 300 sec: 4012.7). Total num frames: 3043328. Throughput: 0: 1009.3. Samples: 758524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:35:44,245][00436] Avg episode reward: [(0, '23.741')]
[2025-02-14 07:35:49,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3059712. Throughput: 0: 1002.4. Samples: 765090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:35:49,246][00436] Avg episode reward: [(0, '23.506')]
[2025-02-14 07:35:52,142][04622] Updated weights for policy 0, policy_version 750 (0.0023)
[2025-02-14 07:35:54,240][00436] Fps is (10 sec: 3687.2, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3080192. Throughput: 0: 1014.6. Samples: 770310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:35:54,248][00436] Avg episode reward: [(0, '24.373')]
[2025-02-14 07:35:59,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3100672. Throughput: 0: 1013.0. Samples: 773720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:35:59,242][00436] Avg episode reward: [(0, '25.499')]
[2025-02-14 07:36:00,922][04622] Updated weights for policy 0, policy_version 760 (0.0012)
[2025-02-14 07:36:04,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3121152. Throughput: 0: 1000.4. Samples: 780010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:36:04,242][00436] Avg episode reward: [(0, '27.260')]
[2025-02-14 07:36:04,245][04608] Saving new best policy, reward=27.260!
[2025-02-14 07:36:09,242][00436] Fps is (10 sec: 4095.4, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 3141632. Throughput: 0: 1012.1. Samples: 785522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:09,250][00436] Avg episode reward: [(0, '25.830')]
[2025-02-14 07:36:11,672][04622] Updated weights for policy 0, policy_version 770 (0.0026)
[2025-02-14 07:36:14,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3162112. Throughput: 0: 1011.8. Samples: 788968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:14,246][00436] Avg episode reward: [(0, '25.006')]
[2025-02-14 07:36:19,249][00436] Fps is (10 sec: 4093.0, 60 sec: 4027.1, 300 sec: 3998.7). Total num frames: 3182592. Throughput: 0: 1002.6. Samples: 795108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:36:19,253][00436] Avg episode reward: [(0, '26.167')]
[2025-02-14 07:36:19,263][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000777_3182592.pth...
[2025-02-14 07:36:19,428][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000542_2220032.pth
[2025-02-14 07:36:22,698][04622] Updated weights for policy 0, policy_version 780 (0.0025)
[2025-02-14 07:36:24,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3198976. Throughput: 0: 1006.8. Samples: 800350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:24,248][00436] Avg episode reward: [(0, '26.335')]
[2025-02-14 07:36:29,241][00436] Fps is (10 sec: 4099.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3223552. Throughput: 0: 1004.1. Samples: 803708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:36:29,243][00436] Avg episode reward: [(0, '26.680')]
[2025-02-14 07:36:32,451][04622] Updated weights for policy 0, policy_version 790 (0.0020)
[2025-02-14 07:36:34,243][00436] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 3239936. Throughput: 0: 989.6. Samples: 809626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:34,248][00436] Avg episode reward: [(0, '25.282')]
[2025-02-14 07:36:39,240][00436] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3260416. Throughput: 0: 1000.4. Samples: 815330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:39,248][00436] Avg episode reward: [(0, '25.117')]
[2025-02-14 07:36:42,421][04622] Updated weights for policy 0, policy_version 800 (0.0016)
[2025-02-14 07:36:44,240][00436] Fps is (10 sec: 4506.8, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 3284992. Throughput: 0: 1001.0. Samples: 818766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:36:44,246][00436] Avg episode reward: [(0, '26.939')]
[2025-02-14 07:36:49,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3297280. Throughput: 0: 991.4. Samples: 824624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:49,247][00436] Avg episode reward: [(0, '25.444')]
[2025-02-14 07:36:53,233][04622] Updated weights for policy 0, policy_version 810 (0.0023)
[2025-02-14 07:36:54,240][00436] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3317760. Throughput: 0: 998.8. Samples: 830466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:36:54,242][00436] Avg episode reward: [(0, '26.589')]
[2025-02-14 07:36:59,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3342336. Throughput: 0: 1000.0. Samples: 833970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:36:59,247][00436] Avg episode reward: [(0, '25.980')]
[2025-02-14 07:37:03,068][04622] Updated weights for policy 0, policy_version 820 (0.0013)
[2025-02-14 07:37:04,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3358720. Throughput: 0: 990.9. Samples: 839688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:37:04,247][00436] Avg episode reward: [(0, '26.249')]
[2025-02-14 07:37:09,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 3383296. Throughput: 0: 1005.4. Samples: 845594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:37:09,242][00436] Avg episode reward: [(0, '25.071')]
[2025-02-14 07:37:12,759][04622] Updated weights for policy 0, policy_version 830 (0.0019)
[2025-02-14 07:37:14,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3403776. Throughput: 0: 1008.3. Samples: 849080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:37:14,246][00436] Avg episode reward: [(0, '25.741')]
[2025-02-14 07:37:19,240][00436] Fps is (10 sec: 3686.4, 60 sec: 3960.1, 300 sec: 4012.7). Total num frames: 3420160. Throughput: 0: 1002.8. Samples: 854748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:37:19,246][00436] Avg episode reward: [(0, '24.750')]
[2025-02-14 07:37:23,694][04622] Updated weights for policy 0, policy_version 840 (0.0021)
[2025-02-14 07:37:24,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3440640. Throughput: 0: 1008.1. Samples: 860694. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:37:24,245][00436] Avg episode reward: [(0, '25.794')]
[2025-02-14 07:37:29,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 3465216. Throughput: 0: 1009.2. Samples: 864182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:37:29,247][00436] Avg episode reward: [(0, '26.312')]
[2025-02-14 07:37:34,071][04622] Updated weights for policy 0, policy_version 850 (0.0012)
[2025-02-14 07:37:34,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 3481600. Throughput: 0: 1002.7. Samples: 869744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:37:34,242][00436] Avg episode reward: [(0, '25.470')]
[2025-02-14 07:37:39,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3502080. Throughput: 0: 1013.3. Samples: 876066. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:37:39,242][00436] Avg episode reward: [(0, '25.970')]
[2025-02-14 07:37:42,909][04622] Updated weights for policy 0, policy_version 860 (0.0028)
[2025-02-14 07:37:44,245][00436] Fps is (10 sec: 4503.5, 60 sec: 4027.4, 300 sec: 4026.5). Total num frames: 3526656. Throughput: 0: 1014.5. Samples: 879626. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:37:44,248][00436] Avg episode reward: [(0, '25.240')]
[2025-02-14 07:37:49,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4012.8). Total num frames: 3543040. Throughput: 0: 1007.6. Samples: 885032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-02-14 07:37:49,246][00436] Avg episode reward: [(0, '25.148')]
[2025-02-14 07:37:53,464][04622] Updated weights for policy 0, policy_version 870 (0.0024)
[2025-02-14 07:37:54,240][00436] Fps is (10 sec: 3688.1, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3563520. Throughput: 0: 1019.8. Samples: 891484. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:37:54,245][00436] Avg episode reward: [(0, '24.830')]
[2025-02-14 07:37:59,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3588096. Throughput: 0: 1019.9. Samples: 894976. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:37:59,244][00436] Avg episode reward: [(0, '24.278')]
[2025-02-14 07:38:04,184][04622] Updated weights for policy 0, policy_version 880 (0.0012)
[2025-02-14 07:38:04,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3604480. Throughput: 0: 1007.2. Samples: 900070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:38:04,242][00436] Avg episode reward: [(0, '23.579')]
[2025-02-14 07:38:09,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3624960. Throughput: 0: 1021.6. Samples: 906664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:38:09,245][00436] Avg episode reward: [(0, '23.139')]
[2025-02-14 07:38:13,068][04622] Updated weights for policy 0, policy_version 890 (0.0013)
[2025-02-14 07:38:14,240][00436] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3649536. Throughput: 0: 1022.2. Samples: 910182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:38:14,246][00436] Avg episode reward: [(0, '23.301')]
[2025-02-14 07:38:19,241][00436] Fps is (10 sec: 3685.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3661824. Throughput: 0: 1009.8. Samples: 915186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:38:19,244][00436] Avg episode reward: [(0, '23.173')]
[2025-02-14 07:38:19,256][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth...
[2025-02-14 07:38:19,417][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000660_2703360.pth
[2025-02-14 07:38:23,888][04622] Updated weights for policy 0, policy_version 900 (0.0020)
[2025-02-14 07:38:24,242][00436] Fps is (10 sec: 3685.7, 60 sec: 4095.9, 300 sec: 4026.5). Total num frames: 3686400. Throughput: 0: 1017.0. Samples: 921834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:38:24,247][00436] Avg episode reward: [(0, '23.255')]
[2025-02-14 07:38:29,241][00436] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3706880. Throughput: 0: 1012.8. Samples: 925196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:38:29,247][00436] Avg episode reward: [(0, '23.885')]
[2025-02-14 07:38:34,240][00436] Fps is (10 sec: 3687.1, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3723264. Throughput: 0: 1001.6. Samples: 930102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:38:34,247][00436] Avg episode reward: [(0, '24.474')]
[2025-02-14 07:38:34,589][04622] Updated weights for policy 0, policy_version 910 (0.0017)
[2025-02-14 07:38:39,241][00436] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3747840. Throughput: 0: 1012.0. Samples: 937024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:38:39,248][00436] Avg episode reward: [(0, '23.977')]
[2025-02-14 07:38:43,761][04622] Updated weights for policy 0, policy_version 920 (0.0014)
[2025-02-14 07:38:44,240][00436] Fps is (10 sec: 4505.7, 60 sec: 4028.0, 300 sec: 4026.6). Total num frames: 3768320. Throughput: 0: 1011.6. Samples: 940496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:38:44,242][00436] Avg episode reward: [(0, '23.977')]
[2025-02-14 07:38:49,240][00436] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3784704. Throughput: 0: 1006.9. Samples: 945382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:38:49,247][00436] Avg episode reward: [(0, '23.865')]
[2025-02-14 07:38:54,179][04622] Updated weights for policy 0, policy_version 930 (0.0018)
[2025-02-14 07:38:54,240][00436] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3809280. Throughput: 0: 1010.5. Samples: 952136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:38:54,247][00436] Avg episode reward: [(0, '25.630')]
[2025-02-14 07:38:59,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3825664. Throughput: 0: 1009.1. Samples: 955590. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-14 07:38:59,243][00436] Avg episode reward: [(0, '25.000')]
[2025-02-14 07:39:04,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3846144. Throughput: 0: 1005.9. Samples: 960452. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:39:04,248][00436] Avg episode reward: [(0, '25.738')]
[2025-02-14 07:39:04,871][04622] Updated weights for policy 0, policy_version 940 (0.0020)
[2025-02-14 07:39:09,240][00436] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3866624. Throughput: 0: 1013.2. Samples: 967426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:39:09,244][00436] Avg episode reward: [(0, '26.803')]
[2025-02-14 07:39:14,240][00436] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3887104. Throughput: 0: 1015.8. Samples: 970908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:39:14,242][00436] Avg episode reward: [(0, '26.635')]
[2025-02-14 07:39:15,209][04622] Updated weights for policy 0, policy_version 950 (0.0025)
[2025-02-14 07:39:19,240][00436] Fps is (10 sec: 4096.1, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 3907584. Throughput: 0: 1018.1. Samples: 975918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:39:19,247][00436] Avg episode reward: [(0, '24.862')]
[2025-02-14 07:39:24,154][04622] Updated weights for policy 0, policy_version 960 (0.0020)
[2025-02-14 07:39:24,240][00436] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 4054.3). Total num frames: 3932160. Throughput: 0: 1018.3. Samples: 982846. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-14 07:39:24,242][00436] Avg episode reward: [(0, '24.505')]
[2025-02-14 07:39:29,240][00436] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 3948544. Throughput: 0: 1013.4. Samples: 986098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:39:29,245][00436] Avg episode reward: [(0, '23.735')]
[2025-02-14 07:39:34,240][00436] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3969024. Throughput: 0: 1017.8. Samples: 991184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:39:34,243][00436] Avg episode reward: [(0, '23.052')]
[2025-02-14 07:39:34,843][04622] Updated weights for policy 0, policy_version 970 (0.0012)
[2025-02-14 07:39:39,240][00436] Fps is (10 sec: 4096.1, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 3989504. Throughput: 0: 1018.5. Samples: 997968. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-14 07:39:39,246][00436] Avg episode reward: [(0, '22.781')]
[2025-02-14 07:39:42,959][04608] Stopping Batcher_0...
[2025-02-14 07:39:42,960][04608] Loop batcher_evt_loop terminating...
[2025-02-14 07:39:42,960][00436] Component Batcher_0 stopped!
[2025-02-14 07:39:42,966][00436] Component RolloutWorker_w3 process died already! Don't wait for it.
[2025-02-14 07:39:42,971][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-14 07:39:43,060][04622] Weights refcount: 2 0
[2025-02-14 07:39:43,072][04622] Stopping InferenceWorker_p0-w0...
[2025-02-14 07:39:43,072][04622] Loop inference_proc0-0_evt_loop terminating...
[2025-02-14 07:39:43,072][00436] Component InferenceWorker_p0-w0 stopped!
[2025-02-14 07:39:43,097][04608] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000777_3182592.pth
[2025-02-14 07:39:43,119][04608] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-14 07:39:43,331][00436] Component LearnerWorker_p0 stopped!
[2025-02-14 07:39:43,334][04608] Stopping LearnerWorker_p0...
[2025-02-14 07:39:43,334][04608] Loop learner_proc0_evt_loop terminating...
[2025-02-14 07:39:43,487][04623] Stopping RolloutWorker_w1...
[2025-02-14 07:39:43,488][04623] Loop rollout_proc1_evt_loop terminating...
[2025-02-14 07:39:43,488][00436] Component RolloutWorker_w1 stopped!
[2025-02-14 07:39:43,532][00436] Component RolloutWorker_w0 stopped!
[2025-02-14 07:39:43,542][04621] Stopping RolloutWorker_w0...
[2025-02-14 07:39:43,542][04621] Loop rollout_proc0_evt_loop terminating...
[2025-02-14 07:39:43,551][00436] Component RolloutWorker_w2 stopped!
[2025-02-14 07:39:43,559][04624] Stopping RolloutWorker_w2...
[2025-02-14 07:39:43,559][04624] Loop rollout_proc2_evt_loop terminating...
[2025-02-14 07:39:43,562][00436] Component RolloutWorker_w6 stopped!
[2025-02-14 07:39:43,567][04628] Stopping RolloutWorker_w6...
[2025-02-14 07:39:43,568][04628] Loop rollout_proc6_evt_loop terminating...
[2025-02-14 07:39:43,573][04627] Stopping RolloutWorker_w5...
[2025-02-14 07:39:43,574][04627] Loop rollout_proc5_evt_loop terminating...
[2025-02-14 07:39:43,573][00436] Component RolloutWorker_w4 stopped!
[2025-02-14 07:39:43,577][00436] Component RolloutWorker_w5 stopped!
[2025-02-14 07:39:43,582][04626] Stopping RolloutWorker_w4...
[2025-02-14 07:39:43,583][04626] Loop rollout_proc4_evt_loop terminating...
[2025-02-14 07:39:43,756][04629] Stopping RolloutWorker_w7...
[2025-02-14 07:39:43,759][04629] Loop rollout_proc7_evt_loop terminating...
[2025-02-14 07:39:43,756][00436] Component RolloutWorker_w7 stopped!
[2025-02-14 07:39:43,763][00436] Waiting for process learner_proc0 to stop...
[2025-02-14 07:39:45,451][00436] Waiting for process inference_proc0-0 to join...
[2025-02-14 07:39:45,457][00436] Waiting for process rollout_proc0 to join...
[2025-02-14 07:39:47,695][00436] Waiting for process rollout_proc1 to join...
[2025-02-14 07:39:47,697][00436] Waiting for process rollout_proc2 to join...
[2025-02-14 07:39:47,699][00436] Waiting for process rollout_proc3 to join...
[2025-02-14 07:39:47,700][00436] Waiting for process rollout_proc4 to join...
[2025-02-14 07:39:47,701][00436] Waiting for process rollout_proc5 to join...
[2025-02-14 07:39:47,703][00436] Waiting for process rollout_proc6 to join...
[2025-02-14 07:39:47,705][00436] Waiting for process rollout_proc7 to join...
[2025-02-14 07:39:47,707][00436] Batcher 0 profile tree view:
batching: 24.5308, releasing_batches: 0.0254
[2025-02-14 07:39:47,709][00436] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 403.2455
update_model: 8.3502
weight_update: 0.0022
one_step: 0.0039
handle_policy_step: 567.3101
deserialize: 13.6692, stack: 3.2107, obs_to_device_normalize: 121.5577, forward: 298.6928, send_messages: 24.2719
prepare_outputs: 81.6524
to_cpu: 51.2355
[2025-02-14 07:39:47,710][00436] Learner 0 profile tree view:
misc: 0.0038, prepare_batch: 12.6112
train: 70.2627
epoch_init: 0.0045, minibatch_init: 0.0065, losses_postprocess: 0.6231, kl_divergence: 0.6273, after_optimizer: 33.0908
calculate_losses: 24.2577
losses_init: 0.0032, forward_head: 1.2739, bptt_initial: 16.1137, tail: 1.0098, advantages_returns: 0.2412, losses: 3.4637
bptt: 1.9242
bptt_forward_core: 1.8161
update: 11.0621
clip: 0.8515
[2025-02-14 07:39:47,711][00436] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.2696, enqueue_policy_requests: 90.1751, env_step: 811.9072, overhead: 12.5710, complete_rollouts: 8.2949
save_policy_outputs: 20.0645
split_output_tensors: 7.8047
[2025-02-14 07:39:47,713][00436] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.2817, enqueue_policy_requests: 125.1016, env_step: 776.3216, overhead: 11.8984, complete_rollouts: 5.8138
save_policy_outputs: 16.8293
split_output_tensors: 6.5355
[2025-02-14 07:39:47,714][00436] Loop Runner_EvtLoop terminating...
[2025-02-14 07:39:47,716][00436] Runner profile tree view:
main_loop: 1042.0112
[2025-02-14 07:39:47,717][00436] Collected {0: 4005888}, FPS: 3844.4
[2025-02-14 07:40:16,022][00436] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-14 07:40:16,024][00436] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-14 07:40:16,026][00436] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-14 07:40:16,028][00436] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-14 07:40:16,030][00436] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-14 07:40:16,032][00436] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-14 07:40:16,033][00436] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-14 07:40:16,035][00436] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-14 07:40:16,036][00436] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-14 07:40:16,037][00436] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-14 07:40:16,038][00436] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-14 07:40:16,039][00436] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-14 07:40:16,040][00436] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-14 07:40:16,041][00436] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-14 07:40:16,042][00436] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-14 07:40:16,080][00436] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:40:16,086][00436] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:40:16,090][00436] RunningMeanStd input shape: (1,)
[2025-02-14 07:40:16,109][00436] ConvEncoder: input_channels=3
[2025-02-14 07:40:16,231][00436] Conv encoder output size: 512
[2025-02-14 07:40:16,233][00436] Policy head output size: 512
[2025-02-14 07:40:16,411][00436] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-14 07:40:17,152][00436] Num frames 100...
[2025-02-14 07:40:17,285][00436] Num frames 200...
[2025-02-14 07:40:17,418][00436] Num frames 300...
[2025-02-14 07:40:17,548][00436] Num frames 400...
[2025-02-14 07:40:17,744][00436] Avg episode rewards: #0: 10.800, true rewards: #0: 4.800
[2025-02-14 07:40:17,746][00436] Avg episode reward: 10.800, avg true_objective: 4.800
[2025-02-14 07:40:17,795][00436] Num frames 500...
[2025-02-14 07:40:17,976][00436] Num frames 600...
[2025-02-14 07:40:18,146][00436] Num frames 700...
[2025-02-14 07:40:18,334][00436] Num frames 800...
[2025-02-14 07:40:18,512][00436] Num frames 900...
[2025-02-14 07:40:18,692][00436] Num frames 1000...
[2025-02-14 07:40:18,866][00436] Num frames 1100...
[2025-02-14 07:40:19,045][00436] Num frames 1200...
[2025-02-14 07:40:19,136][00436] Avg episode rewards: #0: 14.085, true rewards: #0: 6.085
[2025-02-14 07:40:19,138][00436] Avg episode reward: 14.085, avg true_objective: 6.085
[2025-02-14 07:40:19,307][00436] Num frames 1300...
[2025-02-14 07:40:19,495][00436] Num frames 1400...
[2025-02-14 07:40:19,677][00436] Num frames 1500...
[2025-02-14 07:40:19,865][00436] Num frames 1600...
[2025-02-14 07:40:20,012][00436] Num frames 1700...
[2025-02-14 07:40:20,168][00436] Num frames 1800...
[2025-02-14 07:40:20,325][00436] Num frames 1900...
[2025-02-14 07:40:20,462][00436] Num frames 2000...
[2025-02-14 07:40:20,595][00436] Num frames 2100...
[2025-02-14 07:40:20,737][00436] Num frames 2200...
[2025-02-14 07:40:20,865][00436] Num frames 2300...
[2025-02-14 07:40:21,001][00436] Num frames 2400...
[2025-02-14 07:40:21,130][00436] Num frames 2500...
[2025-02-14 07:40:21,276][00436] Num frames 2600...
[2025-02-14 07:40:21,387][00436] Avg episode rewards: #0: 19.810, true rewards: #0: 8.810
[2025-02-14 07:40:21,389][00436] Avg episode reward: 19.810, avg true_objective: 8.810
[2025-02-14 07:40:21,468][00436] Num frames 2700...
[2025-02-14 07:40:21,599][00436] Num frames 2800...
[2025-02-14 07:40:21,737][00436] Num frames 2900...
[2025-02-14 07:40:21,870][00436] Num frames 3000...
[2025-02-14 07:40:22,010][00436] Num frames 3100...
[2025-02-14 07:40:22,156][00436] Num frames 3200...
[2025-02-14 07:40:22,309][00436] Num frames 3300...
[2025-02-14 07:40:22,451][00436] Num frames 3400...
[2025-02-14 07:40:22,581][00436] Num frames 3500...
[2025-02-14 07:40:22,714][00436] Num frames 3600...
[2025-02-14 07:40:22,847][00436] Num frames 3700...
[2025-02-14 07:40:22,978][00436] Num frames 3800...
[2025-02-14 07:40:23,038][00436] Avg episode rewards: #0: 22.508, true rewards: #0: 9.507
[2025-02-14 07:40:23,040][00436] Avg episode reward: 22.508, avg true_objective: 9.507
[2025-02-14 07:40:23,173][00436] Num frames 3900...
[2025-02-14 07:40:23,309][00436] Num frames 4000...
[2025-02-14 07:40:23,451][00436] Num frames 4100...
[2025-02-14 07:40:23,537][00436] Avg episode rewards: #0: 18.846, true rewards: #0: 8.246
[2025-02-14 07:40:23,539][00436] Avg episode reward: 18.846, avg true_objective: 8.246
[2025-02-14 07:40:23,646][00436] Num frames 4200...
[2025-02-14 07:40:23,779][00436] Num frames 4300...
[2025-02-14 07:40:23,911][00436] Num frames 4400...
[2025-02-14 07:40:24,047][00436] Num frames 4500...
[2025-02-14 07:40:24,188][00436] Num frames 4600...
[2025-02-14 07:40:24,326][00436] Num frames 4700...
[2025-02-14 07:40:24,470][00436] Num frames 4800...
[2025-02-14 07:40:24,607][00436] Num frames 4900...
[2025-02-14 07:40:24,739][00436] Num frames 5000...
[2025-02-14 07:40:24,862][00436] Avg episode rewards: #0: 18.918, true rewards: #0: 8.418
[2025-02-14 07:40:24,864][00436] Avg episode reward: 18.918, avg true_objective: 8.418
[2025-02-14 07:40:24,934][00436] Num frames 5100...
[2025-02-14 07:40:25,069][00436] Num frames 5200...
[2025-02-14 07:40:25,212][00436] Num frames 5300...
[2025-02-14 07:40:25,344][00436] Num frames 5400...
[2025-02-14 07:40:25,488][00436] Num frames 5500...
[2025-02-14 07:40:25,621][00436] Num frames 5600...
[2025-02-14 07:40:25,751][00436] Num frames 5700...
[2025-02-14 07:40:25,884][00436] Num frames 5800...
[2025-02-14 07:40:26,018][00436] Num frames 5900...
[2025-02-14 07:40:26,147][00436] Num frames 6000...
[2025-02-14 07:40:26,286][00436] Num frames 6100...
[2025-02-14 07:40:26,429][00436] Num frames 6200...
[2025-02-14 07:40:26,562][00436] Num frames 6300...
[2025-02-14 07:40:26,694][00436] Num frames 6400...
[2025-02-14 07:40:26,831][00436] Num frames 6500...
[2025-02-14 07:40:27,002][00436] Avg episode rewards: #0: 21.696, true rewards: #0: 9.410
[2025-02-14 07:40:27,004][00436] Avg episode reward: 21.696, avg true_objective: 9.410
[2025-02-14 07:40:27,025][00436] Num frames 6600...
[2025-02-14 07:40:27,157][00436] Num frames 6700...
[2025-02-14 07:40:27,298][00436] Num frames 6800...
[2025-02-14 07:40:27,439][00436] Num frames 6900...
[2025-02-14 07:40:27,573][00436] Num frames 7000...
[2025-02-14 07:40:27,710][00436] Num frames 7100...
[2025-02-14 07:40:27,840][00436] Num frames 7200...
[2025-02-14 07:40:27,969][00436] Num frames 7300...
[2025-02-14 07:40:28,100][00436] Num frames 7400...
[2025-02-14 07:40:28,182][00436] Avg episode rewards: #0: 21.024, true rewards: #0: 9.274
[2025-02-14 07:40:28,187][00436] Avg episode reward: 21.024, avg true_objective: 9.274
[2025-02-14 07:40:28,299][00436] Num frames 7500...
[2025-02-14 07:40:28,429][00436] Num frames 7600...
[2025-02-14 07:40:28,566][00436] Num frames 7700...
[2025-02-14 07:40:28,697][00436] Num frames 7800...
[2025-02-14 07:40:28,830][00436] Num frames 7900...
[2025-02-14 07:40:28,972][00436] Num frames 8000...
[2025-02-14 07:40:29,111][00436] Num frames 8100...
[2025-02-14 07:40:29,255][00436] Num frames 8200...
[2025-02-14 07:40:29,421][00436] Avg episode rewards: #0: 20.537, true rewards: #0: 9.203
[2025-02-14 07:40:29,422][00436] Avg episode reward: 20.537, avg true_objective: 9.203
[2025-02-14 07:40:29,449][00436] Num frames 8300...
[2025-02-14 07:40:29,593][00436] Num frames 8400...
[2025-02-14 07:40:29,726][00436] Num frames 8500...
[2025-02-14 07:40:29,863][00436] Num frames 8600...
[2025-02-14 07:40:30,044][00436] Num frames 8700...
[2025-02-14 07:40:30,225][00436] Num frames 8800...
[2025-02-14 07:40:30,389][00436] Avg episode rewards: #0: 19.459, true rewards: #0: 8.859
[2025-02-14 07:40:30,391][00436] Avg episode reward: 19.459, avg true_objective: 8.859
[2025-02-14 07:41:25,146][00436] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-14 07:43:28,512][00436] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-14 07:43:28,514][00436] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-14 07:43:28,517][00436] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-14 07:43:28,520][00436] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-14 07:43:28,523][00436] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-14 07:43:28,526][00436] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-14 07:43:28,528][00436] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-14 07:43:28,530][00436] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-14 07:43:28,533][00436] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-14 07:43:28,534][00436] Adding new argument 'hf_repository'='gyaan/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-02-14 07:43:28,535][00436] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-14 07:43:28,537][00436] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-14 07:43:28,538][00436] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-14 07:43:28,540][00436] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-14 07:43:28,542][00436] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-14 07:43:28,642][00436] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:43:28,647][00436] RunningMeanStd input shape: (1,)
[2025-02-14 07:43:28,721][00436] ConvEncoder: input_channels=3
[2025-02-14 07:43:28,926][00436] Conv encoder output size: 512
[2025-02-14 07:43:28,931][00436] Policy head output size: 512
[2025-02-14 07:43:28,957][00436] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-14 07:43:29,841][00436] Num frames 100...
[2025-02-14 07:43:30,020][00436] Num frames 200...
[2025-02-14 07:43:30,212][00436] Num frames 300...
[2025-02-14 07:43:30,401][00436] Num frames 400...
[2025-02-14 07:43:30,537][00436] Num frames 500...
[2025-02-14 07:43:30,673][00436] Num frames 600...
[2025-02-14 07:43:30,822][00436] Num frames 700...
[2025-02-14 07:43:30,884][00436] Avg episode rewards: #0: 13.040, true rewards: #0: 7.040
[2025-02-14 07:43:30,886][00436] Avg episode reward: 13.040, avg true_objective: 7.040
[2025-02-14 07:43:31,013][00436] Num frames 800...
[2025-02-14 07:43:31,144][00436] Num frames 900...
[2025-02-14 07:43:31,281][00436] Num frames 1000...
[2025-02-14 07:43:31,413][00436] Num frames 1100...
[2025-02-14 07:43:31,544][00436] Num frames 1200...
[2025-02-14 07:43:31,675][00436] Num frames 1300...
[2025-02-14 07:43:31,825][00436] Num frames 1400...
[2025-02-14 07:43:31,956][00436] Num frames 1500...
[2025-02-14 07:43:32,088][00436] Avg episode rewards: #0: 16.285, true rewards: #0: 7.785
[2025-02-14 07:43:32,089][00436] Avg episode reward: 16.285, avg true_objective: 7.785
[2025-02-14 07:43:32,150][00436] Num frames 1600...
[2025-02-14 07:43:32,290][00436] Num frames 1700...
[2025-02-14 07:43:32,421][00436] Num frames 1800...
[2025-02-14 07:43:32,551][00436] Num frames 1900...
[2025-02-14 07:43:32,681][00436] Num frames 2000...
[2025-02-14 07:43:32,815][00436] Num frames 2100...
[2025-02-14 07:43:32,948][00436] Num frames 2200...
[2025-02-14 07:43:33,080][00436] Num frames 2300...
[2025-02-14 07:43:33,225][00436] Num frames 2400...
[2025-02-14 07:43:33,357][00436] Num frames 2500...
[2025-02-14 07:43:33,487][00436] Num frames 2600...
[2025-02-14 07:43:33,620][00436] Num frames 2700...
[2025-02-14 07:43:33,750][00436] Num frames 2800...
[2025-02-14 07:43:33,896][00436] Num frames 2900...
[2025-02-14 07:43:34,026][00436] Num frames 3000...
[2025-02-14 07:43:34,155][00436] Num frames 3100...
[2025-02-14 07:43:34,296][00436] Num frames 3200...
[2025-02-14 07:43:34,429][00436] Num frames 3300...
[2025-02-14 07:43:34,561][00436] Num frames 3400...
[2025-02-14 07:43:34,692][00436] Num frames 3500...
[2025-02-14 07:43:34,822][00436] Num frames 3600...
[2025-02-14 07:43:34,959][00436] Avg episode rewards: #0: 29.190, true rewards: #0: 12.190
[2025-02-14 07:43:34,961][00436] Avg episode reward: 29.190, avg true_objective: 12.190
[2025-02-14 07:43:35,016][00436] Num frames 3700...
[2025-02-14 07:43:35,146][00436] Num frames 3800...
[2025-02-14 07:43:35,286][00436] Num frames 3900...
[2025-02-14 07:43:35,415][00436] Num frames 4000...
[2025-02-14 07:43:35,544][00436] Num frames 4100...
[2025-02-14 07:43:35,674][00436] Num frames 4200...
[2025-02-14 07:43:35,803][00436] Num frames 4300...
[2025-02-14 07:43:35,942][00436] Num frames 4400...
[2025-02-14 07:43:36,072][00436] Num frames 4500...
[2025-02-14 07:43:36,158][00436] Avg episode rewards: #0: 25.802, true rewards: #0: 11.303
[2025-02-14 07:43:36,160][00436] Avg episode reward: 25.802, avg true_objective: 11.303
[2025-02-14 07:43:36,271][00436] Num frames 4600...
[2025-02-14 07:43:36,400][00436] Num frames 4700...
[2025-02-14 07:43:36,535][00436] Num frames 4800...
[2025-02-14 07:43:36,665][00436] Num frames 4900...
[2025-02-14 07:43:36,796][00436] Num frames 5000...
[2025-02-14 07:43:36,943][00436] Num frames 5100...
[2025-02-14 07:43:37,074][00436] Num frames 5200...
[2025-02-14 07:43:37,212][00436] Num frames 5300...
[2025-02-14 07:43:37,338][00436] Avg episode rewards: #0: 24.706, true rewards: #0: 10.706
[2025-02-14 07:43:37,340][00436] Avg episode reward: 24.706, avg true_objective: 10.706
[2025-02-14 07:43:37,402][00436] Num frames 5400...
[2025-02-14 07:43:37,535][00436] Num frames 5500...
[2025-02-14 07:43:37,666][00436] Num frames 5600...
[2025-02-14 07:43:37,796][00436] Num frames 5700...
[2025-02-14 07:43:37,929][00436] Num frames 5800...
[2025-02-14 07:43:38,065][00436] Num frames 5900...
[2025-02-14 07:43:38,207][00436] Num frames 6000...
[2025-02-14 07:43:38,381][00436] Avg episode rewards: #0: 23.315, true rewards: #0: 10.148
[2025-02-14 07:43:38,384][00436] Avg episode reward: 23.315, avg true_objective: 10.148
[2025-02-14 07:43:38,401][00436] Num frames 6100...
[2025-02-14 07:43:38,532][00436] Num frames 6200...
[2025-02-14 07:43:38,662][00436] Num frames 6300...
[2025-02-14 07:43:38,799][00436] Num frames 6400...
[2025-02-14 07:43:38,867][00436] Avg episode rewards: #0: 20.584, true rewards: #0: 9.156
[2025-02-14 07:43:38,869][00436] Avg episode reward: 20.584, avg true_objective: 9.156
[2025-02-14 07:43:38,997][00436] Num frames 6500...
[2025-02-14 07:43:39,131][00436] Num frames 6600...
[2025-02-14 07:43:39,267][00436] Num frames 6700...
[2025-02-14 07:43:39,396][00436] Num frames 6800...
[2025-02-14 07:43:39,528][00436] Num frames 6900...
[2025-02-14 07:43:39,660][00436] Num frames 7000...
[2025-02-14 07:43:39,821][00436] Avg episode rewards: #0: 19.976, true rewards: #0: 8.851
[2025-02-14 07:43:39,823][00436] Avg episode reward: 19.976, avg true_objective: 8.851
[2025-02-14 07:43:39,855][00436] Num frames 7100...
[2025-02-14 07:43:39,992][00436] Num frames 7200...
[2025-02-14 07:43:40,129][00436] Num frames 7300...
[2025-02-14 07:43:40,266][00436] Num frames 7400...
[2025-02-14 07:43:40,410][00436] Num frames 7500...
[2025-02-14 07:43:40,591][00436] Num frames 7600...
[2025-02-14 07:43:40,765][00436] Num frames 7700...
[2025-02-14 07:43:40,940][00436] Num frames 7800...
[2025-02-14 07:43:41,120][00436] Num frames 7900...
[2025-02-14 07:43:41,308][00436] Num frames 8000...
[2025-02-14 07:43:41,479][00436] Num frames 8100...
[2025-02-14 07:43:41,649][00436] Num frames 8200...
[2025-02-14 07:43:41,826][00436] Num frames 8300...
[2025-02-14 07:43:42,004][00436] Num frames 8400...
[2025-02-14 07:43:42,116][00436] Avg episode rewards: #0: 21.254, true rewards: #0: 9.366
[2025-02-14 07:43:42,118][00436] Avg episode reward: 21.254, avg true_objective: 9.366
[2025-02-14 07:43:42,260][00436] Num frames 8500...
[2025-02-14 07:43:42,445][00436] Num frames 8600...
[2025-02-14 07:43:42,625][00436] Num frames 8700...
[2025-02-14 07:43:42,765][00436] Num frames 8800...
[2025-02-14 07:43:42,898][00436] Num frames 8900...
[2025-02-14 07:43:43,032][00436] Num frames 9000...
[2025-02-14 07:43:43,178][00436] Num frames 9100...
[2025-02-14 07:43:43,309][00436] Num frames 9200...
[2025-02-14 07:43:43,449][00436] Num frames 9300...
[2025-02-14 07:43:43,580][00436] Num frames 9400...
[2025-02-14 07:43:43,685][00436] Avg episode rewards: #0: 21.437, true rewards: #0: 9.437
[2025-02-14 07:43:43,686][00436] Avg episode reward: 21.437, avg true_objective: 9.437
[2025-02-14 07:44:38,128][00436] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-14 07:44:42,755][00436] The model has been pushed to https://huggingface.co/gyaan/rl_course_vizdoom_health_gathering_supreme
[2025-02-14 07:48:00,387][00436] Environment doom_basic already registered, overwriting...
[2025-02-14 07:48:00,389][00436] Environment doom_two_colors_easy already registered, overwriting...
[2025-02-14 07:48:00,391][00436] Environment doom_two_colors_hard already registered, overwriting...
[2025-02-14 07:48:00,393][00436] Environment doom_dm already registered, overwriting...
[2025-02-14 07:48:00,398][00436] Environment doom_dwango5 already registered, overwriting...
[2025-02-14 07:48:00,399][00436] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-02-14 07:48:00,400][00436] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-02-14 07:48:00,401][00436] Environment doom_my_way_home already registered, overwriting...
[2025-02-14 07:48:00,405][00436] Environment doom_deadly_corridor already registered, overwriting...
[2025-02-14 07:48:00,406][00436] Environment doom_defend_the_center already registered, overwriting...
[2025-02-14 07:48:00,407][00436] Environment doom_defend_the_line already registered, overwriting...
[2025-02-14 07:48:00,408][00436] Environment doom_health_gathering already registered, overwriting...
[2025-02-14 07:48:00,409][00436] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-02-14 07:48:00,413][00436] Environment doom_battle already registered, overwriting...
[2025-02-14 07:48:00,414][00436] Environment doom_battle2 already registered, overwriting...
[2025-02-14 07:48:00,415][00436] Environment doom_duel_bots already registered, overwriting...
[2025-02-14 07:48:00,416][00436] Environment doom_deathmatch_bots already registered, overwriting...
[2025-02-14 07:48:00,417][00436] Environment doom_duel already registered, overwriting...
[2025-02-14 07:48:00,417][00436] Environment doom_deathmatch_full already registered, overwriting...
[2025-02-14 07:48:00,418][00436] Environment doom_benchmark already registered, overwriting...
[2025-02-14 07:48:00,419][00436] register_encoder_factory: <function make_vizdoom_encoder at 0x790af59fec00>
[2025-02-14 07:48:00,444][00436] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-14 07:48:00,453][00436] Overriding arg 'train_for_env_steps' with value 5000000 passed from command line
[2025-02-14 07:48:00,465][00436] Experiment dir /content/train_dir/default_experiment already exists!
[2025-02-14 07:48:00,467][00436] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-02-14 07:48:00,468][00436] Weights and Biases integration disabled
[2025-02-14 07:48:00,473][00436] Environment var CUDA_VISIBLE_DEVICES is 0
[2025-02-14 07:48:03,692][00436] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=5000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2025-02-14 07:48:03,694][00436] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-14 07:48:03,696][00436] Rollout worker 0 uses device cpu
[2025-02-14 07:48:03,699][00436] Rollout worker 1 uses device cpu
[2025-02-14 07:48:03,701][00436] Rollout worker 2 uses device cpu
[2025-02-14 07:48:03,702][00436] Rollout worker 3 uses device cpu
[2025-02-14 07:48:03,703][00436] Rollout worker 4 uses device cpu
[2025-02-14 07:48:03,710][00436] Rollout worker 5 uses device cpu
[2025-02-14 07:48:03,712][00436] Rollout worker 6 uses device cpu
[2025-02-14 07:48:03,713][00436] Rollout worker 7 uses device cpu
[2025-02-14 07:48:03,787][00436] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:48:03,790][00436] InferenceWorker_p0-w0: min num requests: 2
[2025-02-14 07:48:03,821][00436] Starting all processes...
[2025-02-14 07:48:03,822][00436] Starting process learner_proc0
[2025-02-14 07:48:03,886][00436] Starting all processes...
[2025-02-14 07:48:03,898][00436] Starting process inference_proc0-0
[2025-02-14 07:48:03,898][00436] Starting process rollout_proc0
[2025-02-14 07:48:03,899][00436] Starting process rollout_proc1
[2025-02-14 07:48:03,899][00436] Starting process rollout_proc2
[2025-02-14 07:48:03,900][00436] Starting process rollout_proc3
[2025-02-14 07:48:03,900][00436] Starting process rollout_proc4
[2025-02-14 07:48:03,901][00436] Starting process rollout_proc5
[2025-02-14 07:48:03,903][00436] Starting process rollout_proc6
[2025-02-14 07:48:03,903][00436] Starting process rollout_proc7
[2025-02-14 07:48:19,194][13629] Worker 4 uses CPU cores [0]
[2025-02-14 07:48:19,300][13627] Worker 2 uses CPU cores [0]
[2025-02-14 07:48:19,403][13631] Worker 6 uses CPU cores [0]
[2025-02-14 07:48:19,410][13630] Worker 5 uses CPU cores [1]
[2025-02-14 07:48:19,419][13626] Worker 1 uses CPU cores [1]
[2025-02-14 07:48:19,511][13607] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:48:19,512][13607] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-14 07:48:19,524][13632] Worker 7 uses CPU cores [1]
[2025-02-14 07:48:19,540][13624] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:48:19,541][13624] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-14 07:48:19,565][13624] Num visible devices: 1
[2025-02-14 07:48:19,566][13607] Num visible devices: 1
[2025-02-14 07:48:19,575][13628] Worker 3 uses CPU cores [1]
[2025-02-14 07:48:19,579][13607] Starting seed is not provided
[2025-02-14 07:48:19,579][13607] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:48:19,579][13607] Initializing actor-critic model on device cuda:0
[2025-02-14 07:48:19,580][13607] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:48:19,581][13607] RunningMeanStd input shape: (1,)
[2025-02-14 07:48:19,592][13625] Worker 0 uses CPU cores [0]
[2025-02-14 07:48:19,600][13607] ConvEncoder: input_channels=3
[2025-02-14 07:48:19,718][13607] Conv encoder output size: 512
[2025-02-14 07:48:19,718][13607] Policy head output size: 512
[2025-02-14 07:48:19,734][13607] Created Actor Critic model with architecture:
[2025-02-14 07:48:19,734][13607] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-14 07:48:19,858][13607] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-14 07:48:21,021][13607] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-14 07:48:21,059][13607] Loading model from checkpoint
[2025-02-14 07:48:21,061][13607] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2025-02-14 07:48:21,061][13607] Initialized policy 0 weights for model version 978
[2025-02-14 07:48:21,063][13607] LearnerWorker_p0 finished initialization!
[2025-02-14 07:48:21,064][13607] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-14 07:48:21,190][13624] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:48:21,192][13624] RunningMeanStd input shape: (1,)
[2025-02-14 07:48:21,204][13624] ConvEncoder: input_channels=3
[2025-02-14 07:48:21,305][13624] Conv encoder output size: 512
[2025-02-14 07:48:21,305][13624] Policy head output size: 512
[2025-02-14 07:48:21,343][00436] Inference worker 0-0 is ready!
[2025-02-14 07:48:21,344][00436] All inference workers are ready! Signal rollout workers to start!
[2025-02-14 07:48:21,595][13629] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,670][13631] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,668][13628] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,675][13632] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,697][13625] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,723][13630] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,740][13627] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:21,774][13626] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-14 07:48:23,118][13628] Decorrelating experience for 0 frames...
[2025-02-14 07:48:23,114][13632] Decorrelating experience for 0 frames...
[2025-02-14 07:48:23,593][13629] Decorrelating experience for 0 frames...
[2025-02-14 07:48:23,632][13631] Decorrelating experience for 0 frames...
[2025-02-14 07:48:23,653][13625] Decorrelating experience for 0 frames...
[2025-02-14 07:48:23,673][13627] Decorrelating experience for 0 frames...
[2025-02-14 07:48:23,779][00436] Heartbeat connected on Batcher_0
[2025-02-14 07:48:23,784][00436] Heartbeat connected on LearnerWorker_p0
[2025-02-14 07:48:23,814][00436] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-14 07:48:24,111][13632] Decorrelating experience for 32 frames...
[2025-02-14 07:48:24,201][13630] Decorrelating experience for 0 frames...
[2025-02-14 07:48:24,527][13631] Decorrelating experience for 32 frames...
[2025-02-14 07:48:24,597][13627] Decorrelating experience for 32 frames...
[2025-02-14 07:48:24,801][13628] Decorrelating experience for 32 frames...
[2025-02-14 07:48:25,297][13625] Decorrelating experience for 32 frames...
[2025-02-14 07:48:25,301][13626] Decorrelating experience for 0 frames...
[2025-02-14 07:48:25,473][00436] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-14 07:48:26,119][13630] Decorrelating experience for 32 frames...
[2025-02-14 07:48:26,149][13631] Decorrelating experience for 64 frames...
[2025-02-14 07:48:26,748][13628] Decorrelating experience for 64 frames...
[2025-02-14 07:48:26,998][13632] Decorrelating experience for 64 frames...
[2025-02-14 07:48:27,257][13629] Decorrelating experience for 32 frames...
[2025-02-14 07:48:27,730][13625] Decorrelating experience for 64 frames...
[2025-02-14 07:48:28,211][13630] Decorrelating experience for 64 frames...
[2025-02-14 07:48:28,218][13627] Decorrelating experience for 64 frames...
[2025-02-14 07:48:28,639][13628] Decorrelating experience for 96 frames...
[2025-02-14 07:48:29,022][00436] Heartbeat connected on RolloutWorker_w3
[2025-02-14 07:48:29,138][13631] Decorrelating experience for 96 frames...
[2025-02-14 07:48:29,793][00436] Heartbeat connected on RolloutWorker_w6
[2025-02-14 07:48:30,180][13629] Decorrelating experience for 64 frames...
[2025-02-14 07:48:30,291][13632] Decorrelating experience for 96 frames...
[2025-02-14 07:48:30,301][13626] Decorrelating experience for 32 frames...
[2025-02-14 07:48:30,421][13625] Decorrelating experience for 96 frames...
[2025-02-14 07:48:30,473][00436] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 12.0. Samples: 60. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-14 07:48:30,600][00436] Heartbeat connected on RolloutWorker_w7
[2025-02-14 07:48:30,889][00436] Heartbeat connected on RolloutWorker_w0
[2025-02-14 07:48:31,876][13627] Decorrelating experience for 96 frames...
[2025-02-14 07:48:32,485][00436] Heartbeat connected on RolloutWorker_w2
[2025-02-14 07:48:32,744][13630] Decorrelating experience for 96 frames...
[2025-02-14 07:48:33,274][00436] Heartbeat connected on RolloutWorker_w5
[2025-02-14 07:48:33,899][13626] Decorrelating experience for 64 frames...
[2025-02-14 07:48:34,563][13607] Signal inference workers to stop experience collection...
[2025-02-14 07:48:34,588][13624] InferenceWorker_p0-w0: stopping experience collection
[2025-02-14 07:48:35,005][13626] Decorrelating experience for 96 frames...
[2025-02-14 07:48:35,045][13629] Decorrelating experience for 96 frames...
[2025-02-14 07:48:35,127][00436] Heartbeat connected on RolloutWorker_w4
[2025-02-14 07:48:35,173][00436] Heartbeat connected on RolloutWorker_w1
[2025-02-14 07:48:35,473][00436] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 179.2. Samples: 1792. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-14 07:48:35,485][00436] Avg episode reward: [(0, '5.519')]
[2025-02-14 07:48:35,556][13607] Signal inference workers to resume experience collection...
[2025-02-14 07:48:35,557][13624] InferenceWorker_p0-w0: resuming experience collection
[2025-02-14 07:48:40,474][00436] Fps is (10 sec: 2457.5, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4030464. Throughput: 0: 439.6. Samples: 6594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:48:40,476][00436] Avg episode reward: [(0, '10.389')]
[2025-02-14 07:48:45,001][13624] Updated weights for policy 0, policy_version 988 (0.0027)
[2025-02-14 07:48:45,473][00436] Fps is (10 sec: 4096.0, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4046848. Throughput: 0: 561.5. Samples: 11230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:48:45,480][00436] Avg episode reward: [(0, '14.393')]
[2025-02-14 07:48:50,473][00436] Fps is (10 sec: 4096.1, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 4071424. Throughput: 0: 590.2. Samples: 14756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:48:50,478][00436] Avg episode reward: [(0, '16.854')]
[2025-02-14 07:48:54,104][13624] Updated weights for policy 0, policy_version 998 (0.0029)
[2025-02-14 07:48:55,474][00436] Fps is (10 sec: 4505.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 4091904. Throughput: 0: 706.3. Samples: 21188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:48:55,478][00436] Avg episode reward: [(0, '17.677')]
[2025-02-14 07:49:00,473][00436] Fps is (10 sec: 3276.8, 60 sec: 2808.7, 300 sec: 2808.7). Total num frames: 4104192. Throughput: 0: 737.3. Samples: 25804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:49:00,478][00436] Avg episode reward: [(0, '19.975')]
[2025-02-14 07:49:05,091][13624] Updated weights for policy 0, policy_version 1008 (0.0021)
[2025-02-14 07:49:05,473][00436] Fps is (10 sec: 3686.6, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 4128768. Throughput: 0: 731.0. Samples: 29238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:49:05,479][00436] Avg episode reward: [(0, '22.568')]
[2025-02-14 07:49:10,473][00436] Fps is (10 sec: 4505.6, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 4149248. Throughput: 0: 804.5. Samples: 36204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:49:10,477][00436] Avg episode reward: [(0, '25.940')]
[2025-02-14 07:49:15,473][00436] Fps is (10 sec: 3686.4, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 4165632. Throughput: 0: 907.2. Samples: 40886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:49:15,480][00436] Avg episode reward: [(0, '26.159')]
[2025-02-14 07:49:16,080][13624] Updated weights for policy 0, policy_version 1018 (0.0026)
[2025-02-14 07:49:20,473][00436] Fps is (10 sec: 4096.0, 60 sec: 3351.3, 300 sec: 3351.3). Total num frames: 4190208. Throughput: 0: 946.0. Samples: 44364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:49:20,482][00436] Avg episode reward: [(0, '28.399')]
[2025-02-14 07:49:20,489][13607] Saving new best policy, reward=28.399!
[2025-02-14 07:49:24,782][13624] Updated weights for policy 0, policy_version 1028 (0.0015)
[2025-02-14 07:49:25,476][00436] Fps is (10 sec: 4504.4, 60 sec: 3413.2, 300 sec: 3413.2). Total num frames: 4210688. Throughput: 0: 993.7. Samples: 51312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:49:25,484][00436] Avg episode reward: [(0, '29.158')]
[2025-02-14 07:49:25,486][13607] Saving new best policy, reward=29.158!
[2025-02-14 07:49:30,473][00436] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 4222976. Throughput: 0: 987.0. Samples: 55646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:49:30,475][00436] Avg episode reward: [(0, '27.438')]
[2025-02-14 07:49:35,473][00436] Fps is (10 sec: 3687.4, 60 sec: 4027.7, 300 sec: 3452.3). Total num frames: 4247552. Throughput: 0: 983.4. Samples: 59008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:49:35,480][00436] Avg episode reward: [(0, '26.877')]
[2025-02-14 07:49:35,970][13624] Updated weights for policy 0, policy_version 1038 (0.0012)
[2025-02-14 07:49:40,474][00436] Fps is (10 sec: 4505.4, 60 sec: 3959.5, 300 sec: 3495.2). Total num frames: 4268032. Throughput: 0: 993.6. Samples: 65898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:49:40,476][00436] Avg episode reward: [(0, '25.331')]
[2025-02-14 07:49:45,473][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3481.6). Total num frames: 4284416. Throughput: 0: 993.7. Samples: 70522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:49:45,476][00436] Avg episode reward: [(0, '25.188')]
[2025-02-14 07:49:46,752][13624] Updated weights for policy 0, policy_version 1048 (0.0015)
[2025-02-14 07:49:50,473][00436] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3565.9). Total num frames: 4308992. Throughput: 0: 996.2. Samples: 74068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:49:50,478][00436] Avg episode reward: [(0, '24.083')]
[2025-02-14 07:49:55,476][00436] Fps is (10 sec: 4504.5, 60 sec: 3959.3, 300 sec: 3595.3). Total num frames: 4329472. Throughput: 0: 998.6. Samples: 81144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:49:55,480][00436] Avg episode reward: [(0, '25.341')]
[2025-02-14 07:49:56,023][13624] Updated weights for policy 0, policy_version 1058 (0.0014)
[2025-02-14 07:50:00,477][00436] Fps is (10 sec: 3685.1, 60 sec: 4027.5, 300 sec: 3578.5). Total num frames: 4345856. Throughput: 0: 992.5. Samples: 85552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:50:00,486][00436] Avg episode reward: [(0, '25.838')]
[2025-02-14 07:50:00,498][13607] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001061_4345856.pth...
[2025-02-14 07:50:00,627][13607] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000894_3661824.pth
[2025-02-14 07:50:05,473][00436] Fps is (10 sec: 3687.3, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 4366336. Throughput: 0: 989.3. Samples: 88882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:50:05,480][00436] Avg episode reward: [(0, '25.862')]
[2025-02-14 07:50:06,789][13624] Updated weights for policy 0, policy_version 1068 (0.0016)
[2025-02-14 07:50:10,475][00436] Fps is (10 sec: 4506.4, 60 sec: 4027.6, 300 sec: 3666.8). Total num frames: 4390912. Throughput: 0: 987.1. Samples: 95732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:50:10,478][00436] Avg episode reward: [(0, '27.260')]
[2025-02-14 07:50:15,473][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3611.9). Total num frames: 4403200. Throughput: 0: 994.1. Samples: 100382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:50:15,480][00436] Avg episode reward: [(0, '26.786')]
[2025-02-14 07:50:17,519][13624] Updated weights for policy 0, policy_version 1078 (0.0023)
[2025-02-14 07:50:20,473][00436] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3668.6). Total num frames: 4427776. Throughput: 0: 997.2. Samples: 103884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:50:20,478][00436] Avg episode reward: [(0, '26.185')]
[2025-02-14 07:50:25,473][00436] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3686.4). Total num frames: 4448256. Throughput: 0: 996.1. Samples: 110720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:50:25,479][00436] Avg episode reward: [(0, '25.814')]
[2025-02-14 07:50:27,557][13624] Updated weights for policy 0, policy_version 1088 (0.0013)
[2025-02-14 07:50:30,473][00436] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3637.2). Total num frames: 4460544. Throughput: 0: 991.2. Samples: 115124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:50:30,476][00436] Avg episode reward: [(0, '25.918')]
[2025-02-14 07:50:35,473][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 4485120. Throughput: 0: 985.7. Samples: 118424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:50:35,476][00436] Avg episode reward: [(0, '24.770')]
[2025-02-14 07:50:37,525][13624] Updated weights for policy 0, policy_version 1098 (0.0015)
[2025-02-14 07:50:40,473][00436] Fps is (10 sec: 4915.2, 60 sec: 4027.8, 300 sec: 3731.9). Total num frames: 4509696. Throughput: 0: 986.2. Samples: 125520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:50:40,476][00436] Avg episode reward: [(0, '24.364')]
[2025-02-14 07:50:45,474][00436] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3686.4). Total num frames: 4521984. Throughput: 0: 997.4. Samples: 130434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:50:45,482][00436] Avg episode reward: [(0, '24.075')]
[2025-02-14 07:50:48,142][13624] Updated weights for policy 0, policy_version 1108 (0.0026)
[2025-02-14 07:50:50,473][00436] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3728.8). Total num frames: 4546560. Throughput: 0: 999.6. Samples: 133866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:50:50,475][00436] Avg episode reward: [(0, '23.896')]
[2025-02-14 07:50:55,473][00436] Fps is (10 sec: 4915.4, 60 sec: 4027.9, 300 sec: 3768.3). Total num frames: 4571136. Throughput: 0: 1006.4. Samples: 141018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:50:55,478][00436] Avg episode reward: [(0, '24.519')]
[2025-02-14 07:50:57,741][13624] Updated weights for policy 0, policy_version 1118 (0.0015)
[2025-02-14 07:51:00,475][00436] Fps is (10 sec: 3685.9, 60 sec: 3959.6, 300 sec: 3726.0). Total num frames: 4583424. Throughput: 0: 1009.6. Samples: 145816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:51:00,482][00436] Avg episode reward: [(0, '25.762')]
[2025-02-14 07:51:05,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3763.2). Total num frames: 4608000. Throughput: 0: 1009.6. Samples: 149316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-14 07:51:05,476][00436] Avg episode reward: [(0, '26.854')]
[2025-02-14 07:51:07,433][13624] Updated weights for policy 0, policy_version 1128 (0.0021)
[2025-02-14 07:51:10,473][00436] Fps is (10 sec: 4915.9, 60 sec: 4027.8, 300 sec: 3798.1). Total num frames: 4632576. Throughput: 0: 1016.5. Samples: 156464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:51:10,475][00436] Avg episode reward: [(0, '26.332')]
[2025-02-14 07:51:15,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3758.7). Total num frames: 4644864. Throughput: 0: 1027.6. Samples: 161368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:51:15,476][00436] Avg episode reward: [(0, '25.744')]
[2025-02-14 07:51:18,084][13624] Updated weights for policy 0, policy_version 1138 (0.0020)
[2025-02-14 07:51:20,474][00436] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3791.7). Total num frames: 4669440. Throughput: 0: 1031.1. Samples: 164824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:51:20,478][00436] Avg episode reward: [(0, '26.056')]
[2025-02-14 07:51:25,473][00436] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3822.9). Total num frames: 4694016. Throughput: 0: 1032.4. Samples: 171980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:51:25,475][00436] Avg episode reward: [(0, '24.355')]
[2025-02-14 07:51:27,589][13624] Updated weights for policy 0, policy_version 1148 (0.0018)
[2025-02-14 07:51:30,477][00436] Fps is (10 sec: 3685.2, 60 sec: 4095.8, 300 sec: 3786.0). Total num frames: 4706304. Throughput: 0: 1028.2. Samples: 176708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:51:30,482][00436] Avg episode reward: [(0, '24.539')]
[2025-02-14 07:51:35,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3815.7). Total num frames: 4730880. Throughput: 0: 1027.6. Samples: 180106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:51:35,476][00436] Avg episode reward: [(0, '24.743')]
[2025-02-14 07:51:37,471][13624] Updated weights for policy 0, policy_version 1158 (0.0018)
[2025-02-14 07:51:40,474][00436] Fps is (10 sec: 4916.7, 60 sec: 4096.0, 300 sec: 3843.9). Total num frames: 4755456. Throughput: 0: 1028.2. Samples: 187286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:51:40,477][00436] Avg episode reward: [(0, '26.191')]
[2025-02-14 07:51:45,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3809.3). Total num frames: 4767744. Throughput: 0: 1028.9. Samples: 192116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:51:45,476][00436] Avg episode reward: [(0, '26.387')]
[2025-02-14 07:51:48,010][13624] Updated weights for policy 0, policy_version 1168 (0.0017)
[2025-02-14 07:51:50,473][00436] Fps is (10 sec: 3686.6, 60 sec: 4096.0, 300 sec: 3836.3). Total num frames: 4792320. Throughput: 0: 1029.6. Samples: 195648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:51:50,476][00436] Avg episode reward: [(0, '26.799')]
[2025-02-14 07:51:55,473][00436] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3861.9). Total num frames: 4816896. Throughput: 0: 1028.6. Samples: 202750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-14 07:51:55,477][00436] Avg episode reward: [(0, '26.529')]
[2025-02-14 07:51:57,355][13624] Updated weights for policy 0, policy_version 1178 (0.0024)
[2025-02-14 07:52:00,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3829.3). Total num frames: 4829184. Throughput: 0: 1025.3. Samples: 207506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:52:00,478][00436] Avg episode reward: [(0, '27.132')]
[2025-02-14 07:52:00,518][13607] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001180_4833280.pth...
[2025-02-14 07:52:00,666][13607] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
[2025-02-14 07:52:05,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3854.0). Total num frames: 4853760. Throughput: 0: 1021.8. Samples: 210804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-14 07:52:05,476][00436] Avg episode reward: [(0, '27.476')]
[2025-02-14 07:52:07,449][13624] Updated weights for policy 0, policy_version 1188 (0.0027)
[2025-02-14 07:52:10,473][00436] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3877.5). Total num frames: 4878336. Throughput: 0: 1019.9. Samples: 217874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:52:10,476][00436] Avg episode reward: [(0, '27.026')]
[2025-02-14 07:52:15,473][00436] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3864.5). Total num frames: 4894720. Throughput: 0: 1020.7. Samples: 222634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:52:15,477][00436] Avg episode reward: [(0, '26.811')]
[2025-02-14 07:52:18,001][13624] Updated weights for policy 0, policy_version 1198 (0.0015)
[2025-02-14 07:52:20,473][00436] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3869.4). Total num frames: 4915200. Throughput: 0: 1023.9. Samples: 226182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-14 07:52:20,483][00436] Avg episode reward: [(0, '28.712')]
[2025-02-14 07:52:25,475][00436] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 3891.2). Total num frames: 4939776. Throughput: 0: 1023.5. Samples: 233344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:52:25,482][00436] Avg episode reward: [(0, '29.438')]
[2025-02-14 07:52:25,487][13607] Saving new best policy, reward=29.438!
[2025-02-14 07:52:27,425][13624] Updated weights for policy 0, policy_version 1208 (0.0013)
[2025-02-14 07:52:30,473][00436] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 3878.7). Total num frames: 4956160. Throughput: 0: 1018.7. Samples: 237956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-14 07:52:30,479][00436] Avg episode reward: [(0, '29.554')]
[2025-02-14 07:52:30,492][13607] Saving new best policy, reward=29.554!
[2025-02-14 07:52:35,473][00436] Fps is (10 sec: 3687.1, 60 sec: 4096.0, 300 sec: 3883.0). Total num frames: 4976640. Throughput: 0: 1014.8. Samples: 241316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-14 07:52:35,480][00436] Avg episode reward: [(0, '29.219')]
[2025-02-14 07:52:37,506][13624] Updated weights for policy 0, policy_version 1218 (0.0022)
[2025-02-14 07:52:40,473][00436] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3903.2). Total num frames: 5001216. Throughput: 0: 1016.5. Samples: 248492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-14 07:52:40,476][00436] Avg episode reward: [(0, '28.054')]
[2025-02-14 07:52:41,361][13607] Stopping Batcher_0...
[2025-02-14 07:52:41,365][00436] Component Batcher_0 stopped!
[2025-02-14 07:52:41,367][13607] Loop batcher_evt_loop terminating...
[2025-02-14 07:52:41,373][13607] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-02-14 07:52:41,467][13624] Weights refcount: 2 0
[2025-02-14 07:52:41,479][00436] Component InferenceWorker_p0-w0 stopped!
[2025-02-14 07:52:41,483][13624] Stopping InferenceWorker_p0-w0...
[2025-02-14 07:52:41,483][13624] Loop inference_proc0-0_evt_loop terminating...
[2025-02-14 07:52:41,548][13607] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001061_4345856.pth
[2025-02-14 07:52:41,575][13607] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-02-14 07:52:41,827][00436] Component LearnerWorker_p0 stopped!
[2025-02-14 07:52:41,833][13607] Stopping LearnerWorker_p0...
[2025-02-14 07:52:41,833][13607] Loop learner_proc0_evt_loop terminating...
[2025-02-14 07:52:42,072][00436] Component RolloutWorker_w5 stopped!
[2025-02-14 07:52:42,078][13630] Stopping RolloutWorker_w5...
[2025-02-14 07:52:42,083][00436] Component RolloutWorker_w7 stopped!
[2025-02-14 07:52:42,087][13632] Stopping RolloutWorker_w7...
[2025-02-14 07:52:42,094][00436] Component RolloutWorker_w3 stopped!
[2025-02-14 07:52:42,098][13628] Stopping RolloutWorker_w3...
[2025-02-14 07:52:42,099][13628] Loop rollout_proc3_evt_loop terminating...
[2025-02-14 07:52:42,100][13630] Loop rollout_proc5_evt_loop terminating...
[2025-02-14 07:52:42,112][00436] Component RolloutWorker_w1 stopped!
[2025-02-14 07:52:42,115][13626] Stopping RolloutWorker_w1...
[2025-02-14 07:52:42,116][13626] Loop rollout_proc1_evt_loop terminating...
[2025-02-14 07:52:42,107][13632] Loop rollout_proc7_evt_loop terminating...
[2025-02-14 07:52:42,270][00436] Component RolloutWorker_w0 stopped!
[2025-02-14 07:52:42,270][13625] Stopping RolloutWorker_w0...
[2025-02-14 07:52:42,276][13625] Loop rollout_proc0_evt_loop terminating...
[2025-02-14 07:52:42,326][00436] Component RolloutWorker_w2 stopped!
[2025-02-14 07:52:42,334][13627] Stopping RolloutWorker_w2...
[2025-02-14 07:52:42,335][13627] Loop rollout_proc2_evt_loop terminating...
[2025-02-14 07:52:42,437][13631] Stopping RolloutWorker_w6...
[2025-02-14 07:52:42,437][00436] Component RolloutWorker_w6 stopped!
[2025-02-14 07:52:42,443][13629] Stopping RolloutWorker_w4...
[2025-02-14 07:52:42,443][13629] Loop rollout_proc4_evt_loop terminating...
[2025-02-14 07:52:42,443][00436] Component RolloutWorker_w4 stopped!
[2025-02-14 07:52:42,446][00436] Waiting for process learner_proc0 to stop...
[2025-02-14 07:52:42,449][13631] Loop rollout_proc6_evt_loop terminating...
[2025-02-14 07:52:44,115][00436] Waiting for process inference_proc0-0 to join...
[2025-02-14 07:52:44,174][00436] Waiting for process rollout_proc0 to join...
[2025-02-14 07:52:46,466][00436] Waiting for process rollout_proc1 to join...
[2025-02-14 07:52:46,506][00436] Waiting for process rollout_proc2 to join...
[2025-02-14 07:52:46,509][00436] Waiting for process rollout_proc3 to join...
[2025-02-14 07:52:46,511][00436] Waiting for process rollout_proc4 to join...
[2025-02-14 07:52:46,515][00436] Waiting for process rollout_proc5 to join...
[2025-02-14 07:52:46,516][00436] Waiting for process rollout_proc6 to join...
[2025-02-14 07:52:46,518][00436] Waiting for process rollout_proc7 to join...
[2025-02-14 07:52:46,522][00436] Batcher 0 profile tree view:
batching: 6.1311, releasing_batches: 0.0063
[2025-02-14 07:52:46,523][00436] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 103.2326
update_model: 2.0046
weight_update: 0.0022
one_step: 0.0104
handle_policy_step: 144.4081
deserialize: 3.4548, stack: 0.7423, obs_to_device_normalize: 30.3564, forward: 74.6529, send_messages: 7.1637
prepare_outputs: 22.0848
to_cpu: 13.6550
[2025-02-14 07:52:46,524][00436] Learner 0 profile tree view:
misc: 0.0009, prepare_batch: 4.2370
train: 20.0195
epoch_init: 0.0012, minibatch_init: 0.0014, losses_postprocess: 0.1660, kl_divergence: 0.1880, after_optimizer: 0.8447
calculate_losses: 6.5899
losses_init: 0.0008, forward_head: 0.6238, bptt_initial: 4.1340, tail: 0.3297, advantages_returns: 0.0705, losses: 0.8757
bptt: 0.4848
bptt_forward_core: 0.4517
update: 12.0735
clip: 0.2249
[2025-02-14 07:52:46,526][00436] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0736, enqueue_policy_requests: 22.8663, env_step: 200.0027, overhead: 2.9398, complete_rollouts: 2.0251
save_policy_outputs: 4.4570
split_output_tensors: 1.7724
[2025-02-14 07:52:46,527][00436] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0737, enqueue_policy_requests: 24.7789, env_step: 198.5096, overhead: 2.8235, complete_rollouts: 1.8543
save_policy_outputs: 4.2478
split_output_tensors: 1.7220
[2025-02-14 07:52:46,528][00436] Loop Runner_EvtLoop terminating...
[2025-02-14 07:52:46,530][00436] Runner profile tree view:
main_loop: 282.7091
[2025-02-14 07:52:46,531][00436] Collected {0: 5005312}, FPS: 3535.2
[2025-02-14 07:54:33,274][00436] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-14 07:54:33,276][00436] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-14 07:54:33,278][00436] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-14 07:54:33,279][00436] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-14 07:54:33,281][00436] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-14 07:54:33,283][00436] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-14 07:54:33,284][00436] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-14 07:54:33,285][00436] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-14 07:54:33,288][00436] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-14 07:54:33,290][00436] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-14 07:54:33,291][00436] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-14 07:54:33,292][00436] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-14 07:54:33,293][00436] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-14 07:54:33,296][00436] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-14 07:54:33,298][00436] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-14 07:54:33,335][00436] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:54:33,337][00436] RunningMeanStd input shape: (1,)
[2025-02-14 07:54:33,354][00436] ConvEncoder: input_channels=3
[2025-02-14 07:54:33,391][00436] Conv encoder output size: 512
[2025-02-14 07:54:33,392][00436] Policy head output size: 512
[2025-02-14 07:54:33,413][00436] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-02-14 07:54:33,841][00436] Num frames 100...
[2025-02-14 07:54:33,977][00436] Num frames 200...
[2025-02-14 07:54:34,108][00436] Num frames 300...
[2025-02-14 07:54:34,257][00436] Num frames 400...
[2025-02-14 07:54:34,401][00436] Num frames 500...
[2025-02-14 07:54:34,559][00436] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760
[2025-02-14 07:54:34,561][00436] Avg episode reward: 9.760, avg true_objective: 5.760
[2025-02-14 07:54:34,597][00436] Num frames 600...
[2025-02-14 07:54:34,738][00436] Num frames 700...
[2025-02-14 07:54:34,873][00436] Num frames 800...
[2025-02-14 07:54:35,004][00436] Num frames 900...
[2025-02-14 07:54:35,137][00436] Num frames 1000...
[2025-02-14 07:54:35,277][00436] Num frames 1100...
[2025-02-14 07:54:35,409][00436] Num frames 1200...
[2025-02-14 07:54:35,548][00436] Num frames 1300...
[2025-02-14 07:54:35,687][00436] Num frames 1400...
[2025-02-14 07:54:35,825][00436] Num frames 1500...
[2025-02-14 07:54:35,957][00436] Num frames 1600...
[2025-02-14 07:54:36,097][00436] Num frames 1700...
[2025-02-14 07:54:36,191][00436] Avg episode rewards: #0: 19.640, true rewards: #0: 8.640
[2025-02-14 07:54:36,193][00436] Avg episode reward: 19.640, avg true_objective: 8.640
[2025-02-14 07:54:36,299][00436] Num frames 1800...
[2025-02-14 07:54:36,435][00436] Num frames 1900...
[2025-02-14 07:54:36,565][00436] Num frames 2000...
[2025-02-14 07:54:36,721][00436] Num frames 2100...
[2025-02-14 07:54:36,915][00436] Num frames 2200...
[2025-02-14 07:54:37,096][00436] Num frames 2300...
[2025-02-14 07:54:37,281][00436] Num frames 2400...
[2025-02-14 07:54:37,452][00436] Num frames 2500...
[2025-02-14 07:54:37,630][00436] Num frames 2600...
[2025-02-14 07:54:37,760][00436] Avg episode rewards: #0: 20.807, true rewards: #0: 8.807
[2025-02-14 07:54:37,762][00436] Avg episode reward: 20.807, avg true_objective: 8.807
[2025-02-14 07:54:37,870][00436] Num frames 2700...
[2025-02-14 07:54:38,039][00436] Num frames 2800...
[2025-02-14 07:54:38,232][00436] Num frames 2900...
[2025-02-14 07:54:38,424][00436] Num frames 3000...
[2025-02-14 07:54:38,609][00436] Num frames 3100...
[2025-02-14 07:54:38,793][00436] Num frames 3200...
[2025-02-14 07:54:38,986][00436] Num frames 3300...
[2025-02-14 07:54:39,130][00436] Num frames 3400...
[2025-02-14 07:54:39,268][00436] Num frames 3500...
[2025-02-14 07:54:39,403][00436] Num frames 3600...
[2025-02-14 07:54:39,531][00436] Num frames 3700...
[2025-02-14 07:54:39,665][00436] Num frames 3800...
[2025-02-14 07:54:39,766][00436] Avg episode rewards: #0: 22.330, true rewards: #0: 9.580
[2025-02-14 07:54:39,767][00436] Avg episode reward: 22.330, avg true_objective: 9.580
[2025-02-14 07:54:39,864][00436] Num frames 3900...
[2025-02-14 07:54:39,993][00436] Num frames 4000...
[2025-02-14 07:54:40,125][00436] Num frames 4100...
[2025-02-14 07:54:40,264][00436] Num frames 4200...
[2025-02-14 07:54:40,400][00436] Num frames 4300...
[2025-02-14 07:54:40,530][00436] Num frames 4400...
[2025-02-14 07:54:40,598][00436] Avg episode rewards: #0: 19.816, true rewards: #0: 8.816
[2025-02-14 07:54:40,599][00436] Avg episode reward: 19.816, avg true_objective: 8.816
[2025-02-14 07:54:40,727][00436] Num frames 4500...
[2025-02-14 07:54:40,865][00436] Num frames 4600...
[2025-02-14 07:54:41,002][00436] Num frames 4700...
[2025-02-14 07:54:41,131][00436] Num frames 4800...
[2025-02-14 07:54:41,269][00436] Num frames 4900...
[2025-02-14 07:54:41,401][00436] Num frames 5000...
[2025-02-14 07:54:41,530][00436] Num frames 5100...
[2025-02-14 07:54:41,661][00436] Num frames 5200...
[2025-02-14 07:54:41,793][00436] Num frames 5300...
[2025-02-14 07:54:41,854][00436] Avg episode rewards: #0: 19.840, true rewards: #0: 8.840
[2025-02-14 07:54:41,856][00436] Avg episode reward: 19.840, avg true_objective: 8.840
[2025-02-14 07:54:41,991][00436] Num frames 5400...
[2025-02-14 07:54:42,124][00436] Num frames 5500...
[2025-02-14 07:54:42,265][00436] Num frames 5600...
[2025-02-14 07:54:42,400][00436] Num frames 5700...
[2025-02-14 07:54:42,535][00436] Num frames 5800...
[2025-02-14 07:54:42,667][00436] Num frames 5900...
[2025-02-14 07:54:42,797][00436] Num frames 6000...
[2025-02-14 07:54:42,932][00436] Num frames 6100...
[2025-02-14 07:54:43,073][00436] Num frames 6200...
[2025-02-14 07:54:43,212][00436] Num frames 6300...
[2025-02-14 07:54:43,343][00436] Num frames 6400...
[2025-02-14 07:54:43,431][00436] Avg episode rewards: #0: 21.034, true rewards: #0: 9.177
[2025-02-14 07:54:43,432][00436] Avg episode reward: 21.034, avg true_objective: 9.177
[2025-02-14 07:54:43,531][00436] Num frames 6500...
[2025-02-14 07:54:43,660][00436] Num frames 6600...
[2025-02-14 07:54:43,788][00436] Num frames 6700...
[2025-02-14 07:54:43,920][00436] Num frames 6800...
[2025-02-14 07:54:44,059][00436] Num frames 6900...
[2025-02-14 07:54:44,198][00436] Num frames 7000...
[2025-02-14 07:54:44,329][00436] Num frames 7100...
[2025-02-14 07:54:44,459][00436] Num frames 7200...
[2025-02-14 07:54:44,589][00436] Num frames 7300...
[2025-02-14 07:54:44,718][00436] Num frames 7400...
[2025-02-14 07:54:44,848][00436] Num frames 7500...
[2025-02-14 07:54:44,983][00436] Num frames 7600...
[2025-02-14 07:54:45,136][00436] Avg episode rewards: #0: 21.340, true rewards: #0: 9.590
[2025-02-14 07:54:45,137][00436] Avg episode reward: 21.340, avg true_objective: 9.590
[2025-02-14 07:54:45,180][00436] Num frames 7700...
[2025-02-14 07:54:45,316][00436] Num frames 7800...
[2025-02-14 07:54:45,456][00436] Num frames 7900...
[2025-02-14 07:54:45,593][00436] Num frames 8000...
[2025-02-14 07:54:45,728][00436] Num frames 8100...
[2025-02-14 07:54:45,860][00436] Num frames 8200...
[2025-02-14 07:54:45,992][00436] Num frames 8300...
[2025-02-14 07:54:46,130][00436] Num frames 8400...
[2025-02-14 07:54:46,270][00436] Num frames 8500...
[2025-02-14 07:54:46,404][00436] Num frames 8600...
[2025-02-14 07:54:46,536][00436] Num frames 8700...
[2025-02-14 07:54:46,672][00436] Avg episode rewards: #0: 21.622, true rewards: #0: 9.733
[2025-02-14 07:54:46,674][00436] Avg episode reward: 21.622, avg true_objective: 9.733
[2025-02-14 07:54:46,731][00436] Num frames 8800...
[2025-02-14 07:54:46,865][00436] Num frames 8900...
[2025-02-14 07:54:47,000][00436] Num frames 9000...
[2025-02-14 07:54:47,140][00436] Num frames 9100...
[2025-02-14 07:54:47,281][00436] Num frames 9200...
[2025-02-14 07:54:47,421][00436] Num frames 9300...
[2025-02-14 07:54:47,555][00436] Num frames 9400...
[2025-02-14 07:54:47,686][00436] Num frames 9500...
[2025-02-14 07:54:47,823][00436] Num frames 9600...
[2025-02-14 07:54:47,959][00436] Num frames 9700...
[2025-02-14 07:54:48,100][00436] Num frames 9800...
[2025-02-14 07:54:48,244][00436] Num frames 9900...
[2025-02-14 07:54:48,387][00436] Num frames 10000...
[2025-02-14 07:54:48,527][00436] Num frames 10100...
[2025-02-14 07:54:48,591][00436] Avg episode rewards: #0: 22.404, true rewards: #0: 10.104
[2025-02-14 07:54:48,594][00436] Avg episode reward: 22.404, avg true_objective: 10.104
[2025-02-14 07:55:49,275][00436] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-14 07:56:08,753][00436] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-14 07:56:08,755][00436] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-14 07:56:08,757][00436] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-14 07:56:08,758][00436] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-14 07:56:08,761][00436] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-14 07:56:08,763][00436] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-14 07:56:08,764][00436] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-14 07:56:08,768][00436] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-14 07:56:08,769][00436] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-14 07:56:08,770][00436] Adding new argument 'hf_repository'='gyaan/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-02-14 07:56:08,771][00436] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-14 07:56:08,775][00436] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-14 07:56:08,777][00436] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-14 07:56:08,778][00436] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-14 07:56:08,779][00436] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-14 07:56:08,813][00436] RunningMeanStd input shape: (3, 72, 128)
[2025-02-14 07:56:08,815][00436] RunningMeanStd input shape: (1,)
[2025-02-14 07:56:08,828][00436] ConvEncoder: input_channels=3
[2025-02-14 07:56:08,864][00436] Conv encoder output size: 512
[2025-02-14 07:56:08,865][00436] Policy head output size: 512
[2025-02-14 07:56:08,883][00436] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-02-14 07:56:09,360][00436] Num frames 100...
[2025-02-14 07:56:09,497][00436] Num frames 200...
[2025-02-14 07:56:09,624][00436] Num frames 300...
[2025-02-14 07:56:09,761][00436] Num frames 400...
[2025-02-14 07:56:09,899][00436] Num frames 500...
[2025-02-14 07:56:10,028][00436] Num frames 600...
[2025-02-14 07:56:10,169][00436] Num frames 700...
[2025-02-14 07:56:10,310][00436] Num frames 800...
[2025-02-14 07:56:10,441][00436] Num frames 900...
[2025-02-14 07:56:10,571][00436] Num frames 1000...
[2025-02-14 07:56:10,705][00436] Num frames 1100...
[2025-02-14 07:56:10,854][00436] Num frames 1200...
[2025-02-14 07:56:10,986][00436] Num frames 1300...
[2025-02-14 07:56:11,114][00436] Num frames 1400...
[2025-02-14 07:56:11,249][00436] Num frames 1500...
[2025-02-14 07:56:11,386][00436] Num frames 1600...
[2025-02-14 07:56:11,439][00436] Avg episode rewards: #0: 45.000, true rewards: #0: 16.000
[2025-02-14 07:56:11,440][00436] Avg episode reward: 45.000, avg true_objective: 16.000
[2025-02-14 07:56:11,570][00436] Num frames 1700...
[2025-02-14 07:56:11,699][00436] Num frames 1800...
[2025-02-14 07:56:11,827][00436] Num frames 1900...
[2025-02-14 07:56:11,973][00436] Num frames 2000...
[2025-02-14 07:56:12,051][00436] Avg episode rewards: #0: 25.580, true rewards: #0: 10.080
[2025-02-14 07:56:12,052][00436] Avg episode reward: 25.580, avg true_objective: 10.080
[2025-02-14 07:56:12,171][00436] Num frames 2100...
[2025-02-14 07:56:12,309][00436] Num frames 2200...
[2025-02-14 07:56:12,439][00436] Num frames 2300...
[2025-02-14 07:56:12,571][00436] Num frames 2400...
[2025-02-14 07:56:12,704][00436] Num frames 2500...
[2025-02-14 07:56:12,840][00436] Num frames 2600...
[2025-02-14 07:56:12,979][00436] Num frames 2700...
[2025-02-14 07:56:13,112][00436] Num frames 2800...
[2025-02-14 07:56:13,252][00436] Num frames 2900...
[2025-02-14 07:56:13,386][00436] Num frames 3000...
[2025-02-14 07:56:13,518][00436] Num frames 3100...
[2025-02-14 07:56:13,649][00436] Num frames 3200...
[2025-02-14 07:56:13,784][00436] Num frames 3300...
[2025-02-14 07:56:13,928][00436] Num frames 3400...
[2025-02-14 07:56:14,061][00436] Num frames 3500...
[2025-02-14 07:56:14,198][00436] Num frames 3600...
[2025-02-14 07:56:14,331][00436] Num frames 3700...
[2025-02-14 07:56:14,462][00436] Num frames 3800...
[2025-02-14 07:56:14,596][00436] Num frames 3900...
[2025-02-14 07:56:14,727][00436] Num frames 4000...
[2025-02-14 07:56:14,857][00436] Num frames 4100...
[2025-02-14 07:56:14,933][00436] Avg episode rewards: #0: 36.053, true rewards: #0: 13.720
[2025-02-14 07:56:14,935][00436] Avg episode reward: 36.053, avg true_objective: 13.720
[2025-02-14 07:56:15,042][00436] Num frames 4200...
[2025-02-14 07:56:15,177][00436] Num frames 4300...
[2025-02-14 07:56:15,315][00436] Num frames 4400...
[2025-02-14 07:56:15,445][00436] Num frames 4500...
[2025-02-14 07:56:15,590][00436] Num frames 4600...
[2025-02-14 07:56:15,724][00436] Num frames 4700...
[2025-02-14 07:56:15,854][00436] Num frames 4800...
[2025-02-14 07:56:15,937][00436] Avg episode rewards: #0: 30.550, true rewards: #0: 12.050
[2025-02-14 07:56:15,939][00436] Avg episode reward: 30.550, avg true_objective: 12.050
[2025-02-14 07:56:16,077][00436] Num frames 4900...
[2025-02-14 07:56:16,266][00436] Num frames 5000...
[2025-02-14 07:56:16,437][00436] Num frames 5100...
[2025-02-14 07:56:16,613][00436] Num frames 5200...
[2025-02-14 07:56:16,781][00436] Num frames 5300...
[2025-02-14 07:56:17,006][00436] Avg episode rewards: #0: 26.392, true rewards: #0: 10.792
[2025-02-14 07:56:17,008][00436] Avg episode reward: 26.392, avg true_objective: 10.792
[2025-02-14 07:56:17,021][00436] Num frames 5400...
[2025-02-14 07:56:17,192][00436] Num frames 5500...
[2025-02-14 07:56:17,363][00436] Num frames 5600...
[2025-02-14 07:56:17,544][00436] Num frames 5700...
[2025-02-14 07:56:17,733][00436] Num frames 5800...
[2025-02-14 07:56:17,914][00436] Num frames 5900...
[2025-02-14 07:56:18,102][00436] Num frames 6000...
[2025-02-14 07:56:18,289][00436] Num frames 6100...
[2025-02-14 07:56:18,424][00436] Num frames 6200...
[2025-02-14 07:56:18,555][00436] Num frames 6300...
[2025-02-14 07:56:18,687][00436] Num frames 6400...
[2025-02-14 07:56:18,822][00436] Num frames 6500...
[2025-02-14 07:56:18,955][00436] Num frames 6600...
[2025-02-14 07:56:19,095][00436] Num frames 6700...
[2025-02-14 07:56:19,236][00436] Num frames 6800...
[2025-02-14 07:56:19,298][00436] Avg episode rewards: #0: 28.173, true rewards: #0: 11.340
[2025-02-14 07:56:19,299][00436] Avg episode reward: 28.173, avg true_objective: 11.340
[2025-02-14 07:56:19,425][00436] Num frames 6900...
[2025-02-14 07:56:19,553][00436] Num frames 7000...
[2025-02-14 07:56:19,684][00436] Num frames 7100...
[2025-02-14 07:56:19,815][00436] Num frames 7200...
[2025-02-14 07:56:19,957][00436] Num frames 7300...
[2025-02-14 07:56:20,104][00436] Num frames 7400...
[2025-02-14 07:56:20,243][00436] Num frames 7500...
[2025-02-14 07:56:20,374][00436] Num frames 7600...
[2025-02-14 07:56:20,512][00436] Num frames 7700...
[2025-02-14 07:56:20,647][00436] Num frames 7800...
[2025-02-14 07:56:20,780][00436] Num frames 7900...
[2025-02-14 07:56:20,913][00436] Num frames 8000...
[2025-02-14 07:56:20,983][00436] Avg episode rewards: #0: 27.871, true rewards: #0: 11.443
[2025-02-14 07:56:20,985][00436] Avg episode reward: 27.871, avg true_objective: 11.443
[2025-02-14 07:56:21,112][00436] Num frames 8100...
[2025-02-14 07:56:21,249][00436] Num frames 8200...
[2025-02-14 07:56:21,380][00436] Num frames 8300...
[2025-02-14 07:56:21,515][00436] Num frames 8400...
[2025-02-14 07:56:21,649][00436] Num frames 8500...
[2025-02-14 07:56:21,779][00436] Num frames 8600...
[2025-02-14 07:56:21,912][00436] Num frames 8700...
[2025-02-14 07:56:22,046][00436] Num frames 8800...
[2025-02-14 07:56:22,194][00436] Num frames 8900...
[2025-02-14 07:56:22,327][00436] Num frames 9000...
[2025-02-14 07:56:22,505][00436] Avg episode rewards: #0: 28.237, true rewards: #0: 11.362
[2025-02-14 07:56:22,507][00436] Avg episode reward: 28.237, avg true_objective: 11.362
[2025-02-14 07:56:22,525][00436] Num frames 9100...
[2025-02-14 07:56:22,656][00436] Num frames 9200...
[2025-02-14 07:56:22,788][00436] Num frames 9300...
[2025-02-14 07:56:22,925][00436] Num frames 9400...
[2025-02-14 07:56:23,055][00436] Num frames 9500...
[2025-02-14 07:56:23,203][00436] Num frames 9600...
[2025-02-14 07:56:23,335][00436] Num frames 9700...
[2025-02-14 07:56:23,468][00436] Num frames 9800...
[2025-02-14 07:56:23,599][00436] Num frames 9900...
[2025-02-14 07:56:23,731][00436] Num frames 10000...
[2025-02-14 07:56:23,862][00436] Num frames 10100...
[2025-02-14 07:56:23,994][00436] Num frames 10200...
[2025-02-14 07:56:24,105][00436] Avg episode rewards: #0: 27.935, true rewards: #0: 11.380
[2025-02-14 07:56:24,107][00436] Avg episode reward: 27.935, avg true_objective: 11.380
[2025-02-14 07:56:24,194][00436] Num frames 10300...
[2025-02-14 07:56:24,330][00436] Num frames 10400...
[2025-02-14 07:56:24,462][00436] Num frames 10500...
[2025-02-14 07:56:24,591][00436] Num frames 10600...
[2025-02-14 07:56:24,722][00436] Num frames 10700...
[2025-02-14 07:56:24,857][00436] Num frames 10800...
[2025-02-14 07:56:24,990][00436] Num frames 10900...
[2025-02-14 07:56:25,126][00436] Num frames 11000...
[2025-02-14 07:56:25,279][00436] Num frames 11100...
[2025-02-14 07:56:25,413][00436] Num frames 11200...
[2025-02-14 07:56:25,548][00436] Num frames 11300...
[2025-02-14 07:56:25,685][00436] Num frames 11400...
[2025-02-14 07:56:25,816][00436] Num frames 11500...
[2025-02-14 07:56:25,949][00436] Num frames 11600...
[2025-02-14 07:56:26,076][00436] Num frames 11700...
[2025-02-14 07:56:26,216][00436] Num frames 11800...
[2025-02-14 07:56:26,357][00436] Num frames 11900...
[2025-02-14 07:56:26,463][00436] Avg episode rewards: #0: 29.138, true rewards: #0: 11.938
[2025-02-14 07:56:26,466][00436] Avg episode reward: 29.138, avg true_objective: 11.938
[2025-02-14 07:57:38,965][00436] Replay video saved to /content/train_dir/default_experiment/replay.mp4!