[2023-02-26 15:35:06,459][00108] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 15:35:06,462][00108] Rollout worker 0 uses device cpu [2023-02-26 15:35:06,463][00108] Rollout worker 1 uses device cpu [2023-02-26 15:35:06,465][00108] Rollout worker 2 uses device cpu [2023-02-26 15:35:06,466][00108] Rollout worker 3 uses device cpu [2023-02-26 15:35:06,468][00108] Rollout worker 4 uses device cpu [2023-02-26 15:35:06,469][00108] Rollout worker 5 uses device cpu [2023-02-26 15:35:06,471][00108] Rollout worker 6 uses device cpu [2023-02-26 15:35:06,473][00108] Rollout worker 7 uses device cpu [2023-02-26 15:35:06,662][00108] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 15:35:06,664][00108] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 15:35:06,695][00108] Starting all processes... [2023-02-26 15:35:06,697][00108] Starting process learner_proc0 [2023-02-26 15:35:06,751][00108] Starting all processes... [2023-02-26 15:35:06,760][00108] Starting process inference_proc0-0 [2023-02-26 15:35:06,760][00108] Starting process rollout_proc0 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc1 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc2 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc3 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc4 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc5 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc6 [2023-02-26 15:35:06,762][00108] Starting process rollout_proc7 [2023-02-26 15:35:15,918][19044] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 15:35:15,918][19044] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-26 15:35:16,755][19063] Worker 4 uses CPU cores [0] [2023-02-26 15:35:16,767][19065] Worker 6 uses CPU cores [0] [2023-02-26 15:35:16,771][19066] Worker 7 uses CPU cores [1] [2023-02-26 15:35:16,768][19059] Worker 0 uses CPU cores [0] [2023-02-26 15:35:16,843][19064] Worker 5 uses CPU cores [1] [2023-02-26 15:35:16,849][19060] Worker 1 uses CPU cores [1] [2023-02-26 15:35:16,981][19058] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 15:35:16,982][19058] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-26 15:35:17,069][19062] Worker 3 uses CPU cores [1] [2023-02-26 15:35:17,239][19061] Worker 2 uses CPU cores [0] [2023-02-26 15:35:17,356][19058] Num visible devices: 1 [2023-02-26 15:35:17,357][19044] Num visible devices: 1 [2023-02-26 15:35:17,367][19044] Starting seed is not provided [2023-02-26 15:35:17,368][19044] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 15:35:17,368][19044] Initializing actor-critic model on device cuda:0 [2023-02-26 15:35:17,369][19044] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 15:35:17,372][19044] RunningMeanStd input shape: (1,) [2023-02-26 15:35:17,392][19044] ConvEncoder: input_channels=3 [2023-02-26 15:35:17,730][19044] Conv encoder output size: 512 [2023-02-26 15:35:17,730][19044] Policy head output size: 512 [2023-02-26 15:35:17,789][19044] Created Actor Critic model with architecture: [2023-02-26 15:35:17,789][19044] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 15:35:25,130][19044] Using optimizer [2023-02-26 15:35:25,132][19044] No checkpoints found [2023-02-26 15:35:25,132][19044] Did not load from checkpoint, starting from scratch! [2023-02-26 15:35:25,132][19044] Initialized policy 0 weights for model version 0 [2023-02-26 15:35:25,139][19044] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 15:35:25,150][19044] LearnerWorker_p0 finished initialization! [2023-02-26 15:35:25,340][19058] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 15:35:25,341][19058] RunningMeanStd input shape: (1,) [2023-02-26 15:35:25,360][19058] ConvEncoder: input_channels=3 [2023-02-26 15:35:25,464][19058] Conv encoder output size: 512 [2023-02-26 15:35:25,464][19058] Policy head output size: 512 [2023-02-26 15:35:25,929][00108] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 15:35:26,655][00108] Heartbeat connected on Batcher_0 [2023-02-26 15:35:26,664][00108] Heartbeat connected on LearnerWorker_p0 [2023-02-26 15:35:26,672][00108] Heartbeat connected on RolloutWorker_w0 [2023-02-26 15:35:26,678][00108] Heartbeat connected on RolloutWorker_w1 [2023-02-26 15:35:26,680][00108] Heartbeat connected on RolloutWorker_w2 [2023-02-26 15:35:26,684][00108] Heartbeat connected on RolloutWorker_w3 [2023-02-26 15:35:26,689][00108] Heartbeat connected on RolloutWorker_w4 [2023-02-26 15:35:26,693][00108] Heartbeat connected on RolloutWorker_w5 [2023-02-26 15:35:26,697][00108] Heartbeat connected on RolloutWorker_w6 [2023-02-26 15:35:26,701][00108] Heartbeat connected on RolloutWorker_w7 [2023-02-26 15:35:27,717][00108] Inference worker 0-0 is ready! [2023-02-26 15:35:27,719][00108] All inference workers are ready! Signal rollout workers to start! [2023-02-26 15:35:27,727][00108] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 15:35:27,810][19065] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,815][19059] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,821][19061] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,822][19063] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,870][19066] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,880][19062] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,889][19060] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:27,886][19064] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:35:28,375][19064] Decorrelating experience for 0 frames... [2023-02-26 15:35:28,746][19064] Decorrelating experience for 32 frames... [2023-02-26 15:35:29,155][19064] Decorrelating experience for 64 frames... [2023-02-26 15:35:29,273][19061] Decorrelating experience for 0 frames... [2023-02-26 15:35:29,275][19059] Decorrelating experience for 0 frames... [2023-02-26 15:35:29,277][19063] Decorrelating experience for 0 frames... [2023-02-26 15:35:29,281][19065] Decorrelating experience for 0 frames... [2023-02-26 15:35:30,020][19066] Decorrelating experience for 0 frames... [2023-02-26 15:35:30,068][19060] Decorrelating experience for 0 frames... [2023-02-26 15:35:30,932][19059] Decorrelating experience for 32 frames... [2023-02-26 15:35:30,929][00108] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 15:35:30,942][19063] Decorrelating experience for 32 frames... [2023-02-26 15:35:30,952][19061] Decorrelating experience for 32 frames... [2023-02-26 15:35:30,976][19065] Decorrelating experience for 32 frames... [2023-02-26 15:35:31,718][19066] Decorrelating experience for 32 frames... [2023-02-26 15:35:31,772][19062] Decorrelating experience for 0 frames... [2023-02-26 15:35:31,818][19060] Decorrelating experience for 32 frames... [2023-02-26 15:35:32,235][19064] Decorrelating experience for 96 frames... [2023-02-26 15:35:32,900][19062] Decorrelating experience for 32 frames... [2023-02-26 15:35:33,738][19061] Decorrelating experience for 64 frames... [2023-02-26 15:35:33,752][19065] Decorrelating experience for 64 frames... [2023-02-26 15:35:33,825][19059] Decorrelating experience for 64 frames... [2023-02-26 15:35:34,029][19063] Decorrelating experience for 64 frames... [2023-02-26 15:35:34,559][19062] Decorrelating experience for 64 frames... [2023-02-26 15:35:34,912][19066] Decorrelating experience for 64 frames... [2023-02-26 15:35:35,192][19060] Decorrelating experience for 64 frames... [2023-02-26 15:35:35,598][19062] Decorrelating experience for 96 frames... [2023-02-26 15:35:35,864][19060] Decorrelating experience for 96 frames... [2023-02-26 15:35:35,929][00108] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 15:35:37,231][19065] Decorrelating experience for 96 frames... [2023-02-26 15:35:37,233][19061] Decorrelating experience for 96 frames... [2023-02-26 15:35:37,258][19059] Decorrelating experience for 96 frames... [2023-02-26 15:35:37,604][19063] Decorrelating experience for 96 frames... [2023-02-26 15:35:38,639][19066] Decorrelating experience for 96 frames... [2023-02-26 15:35:40,929][00108] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 116.1. Samples: 1742. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 15:35:40,936][00108] Avg episode reward: [(0, '2.109')] [2023-02-26 15:35:41,074][19044] Signal inference workers to stop experience collection... [2023-02-26 15:35:41,093][19058] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 15:35:43,545][19044] Signal inference workers to resume experience collection... [2023-02-26 15:35:43,546][19058] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 15:35:45,929][00108] Fps is (10 sec: 1228.8, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 224.5. Samples: 4490. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-26 15:35:45,932][00108] Avg episode reward: [(0, '3.200')] [2023-02-26 15:35:50,931][00108] Fps is (10 sec: 3276.4, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 305.6. Samples: 7640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-26 15:35:50,933][00108] Avg episode reward: [(0, '3.735')] [2023-02-26 15:35:53,360][19058] Updated weights for policy 0, policy_version 10 (0.0023) [2023-02-26 15:35:55,931][00108] Fps is (10 sec: 3276.2, 60 sec: 1501.8, 300 sec: 1501.8). Total num frames: 45056. Throughput: 0: 401.5. Samples: 12046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:35:55,940][00108] Avg episode reward: [(0, '4.201')] [2023-02-26 15:36:00,929][00108] Fps is (10 sec: 3277.2, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 503.0. Samples: 17604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 15:36:00,937][00108] Avg episode reward: [(0, '4.336')] [2023-02-26 15:36:03,775][19058] Updated weights for policy 0, policy_version 20 (0.0023) [2023-02-26 15:36:05,932][00108] Fps is (10 sec: 4505.0, 60 sec: 2252.6, 300 sec: 2252.6). Total num frames: 90112. Throughput: 0: 528.5. Samples: 21140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 15:36:05,935][00108] Avg episode reward: [(0, '4.353')] [2023-02-26 15:36:10,929][00108] Fps is (10 sec: 4096.0, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 609.2. Samples: 27414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 15:36:10,937][00108] Avg episode reward: [(0, '4.374')] [2023-02-26 15:36:10,993][19044] Saving new best policy, reward=4.374! [2023-02-26 15:36:15,106][19058] Updated weights for policy 0, policy_version 30 (0.0014) [2023-02-26 15:36:15,930][00108] Fps is (10 sec: 3277.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 703.2. Samples: 31646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:36:15,935][00108] Avg episode reward: [(0, '4.482')] [2023-02-26 15:36:15,938][19044] Saving new best policy, reward=4.482! [2023-02-26 15:36:20,929][00108] Fps is (10 sec: 3686.4, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 761.5. Samples: 34268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:36:20,935][00108] Avg episode reward: [(0, '4.396')] [2023-02-26 15:36:24,785][19058] Updated weights for policy 0, policy_version 40 (0.0024) [2023-02-26 15:36:25,929][00108] Fps is (10 sec: 4505.9, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 882.7. Samples: 41462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:36:25,938][00108] Avg episode reward: [(0, '4.280')] [2023-02-26 15:36:30,931][00108] Fps is (10 sec: 4095.2, 60 sec: 3071.9, 300 sec: 2835.6). Total num frames: 184320. Throughput: 0: 953.3. Samples: 47388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:36:30,936][00108] Avg episode reward: [(0, '4.171')] [2023-02-26 15:36:35,930][00108] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 933.5. Samples: 49646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:36:35,937][00108] Avg episode reward: [(0, '4.252')] [2023-02-26 15:36:36,394][19058] Updated weights for policy 0, policy_version 50 (0.0027) [2023-02-26 15:36:40,929][00108] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 955.5. Samples: 55042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:36:40,935][00108] Avg episode reward: [(0, '4.281')] [2023-02-26 15:36:45,841][19058] Updated weights for policy 0, policy_version 60 (0.0021) [2023-02-26 15:36:45,929][00108] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 988.0. Samples: 62064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:36:45,931][00108] Avg episode reward: [(0, '4.380')] [2023-02-26 15:36:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 978.5. Samples: 65168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:36:50,933][00108] Avg episode reward: [(0, '4.563')] [2023-02-26 15:36:50,947][19044] Saving new best policy, reward=4.563! [2023-02-26 15:36:55,929][00108] Fps is (10 sec: 3276.7, 60 sec: 3891.3, 300 sec: 3094.8). Total num frames: 278528. Throughput: 0: 937.7. Samples: 69612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:36:55,938][00108] Avg episode reward: [(0, '4.631')] [2023-02-26 15:36:55,947][19044] Saving new best policy, reward=4.631! [2023-02-26 15:36:58,184][19058] Updated weights for policy 0, policy_version 70 (0.0016) [2023-02-26 15:37:00,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 967.8. Samples: 75196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:37:00,936][00108] Avg episode reward: [(0, '4.616')] [2023-02-26 15:37:00,956][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2023-02-26 15:37:05,929][00108] Fps is (10 sec: 4096.1, 60 sec: 3823.1, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 985.0. Samples: 78594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:05,937][00108] Avg episode reward: [(0, '4.424')] [2023-02-26 15:37:06,974][19058] Updated weights for policy 0, policy_version 80 (0.0020) [2023-02-26 15:37:10,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 967.9. Samples: 85018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:10,933][00108] Avg episode reward: [(0, '4.382')] [2023-02-26 15:37:15,932][00108] Fps is (10 sec: 3685.5, 60 sec: 3891.1, 300 sec: 3239.5). Total num frames: 356352. Throughput: 0: 935.0. Samples: 89464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:15,941][00108] Avg episode reward: [(0, '4.268')] [2023-02-26 15:37:19,119][19058] Updated weights for policy 0, policy_version 90 (0.0032) [2023-02-26 15:37:20,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 945.6. Samples: 92196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:20,931][00108] Avg episode reward: [(0, '4.384')] [2023-02-26 15:37:25,929][00108] Fps is (10 sec: 4096.9, 60 sec: 3822.9, 300 sec: 3310.9). Total num frames: 397312. Throughput: 0: 987.0. Samples: 99456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:37:25,931][00108] Avg episode reward: [(0, '4.634')] [2023-02-26 15:37:25,948][19044] Saving new best policy, reward=4.634! [2023-02-26 15:37:27,702][19058] Updated weights for policy 0, policy_version 100 (0.0012) [2023-02-26 15:37:30,930][00108] Fps is (10 sec: 4095.8, 60 sec: 3891.3, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 958.7. Samples: 105208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:30,933][00108] Avg episode reward: [(0, '4.701')] [2023-02-26 15:37:30,946][19044] Saving new best policy, reward=4.701! [2023-02-26 15:37:35,929][00108] Fps is (10 sec: 3276.9, 60 sec: 3823.0, 300 sec: 3308.3). Total num frames: 430080. Throughput: 0: 938.7. Samples: 107410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:35,936][00108] Avg episode reward: [(0, '4.742')] [2023-02-26 15:37:35,938][19044] Saving new best policy, reward=4.742! [2023-02-26 15:37:40,032][19058] Updated weights for policy 0, policy_version 110 (0.0022) [2023-02-26 15:37:40,929][00108] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3367.8). Total num frames: 454656. Throughput: 0: 964.1. Samples: 112996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:37:40,931][00108] Avg episode reward: [(0, '4.576')] [2023-02-26 15:37:45,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 996.0. Samples: 120014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:37:45,932][00108] Avg episode reward: [(0, '4.578')] [2023-02-26 15:37:49,700][19058] Updated weights for policy 0, policy_version 120 (0.0014) [2023-02-26 15:37:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 987.2. Samples: 123016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 15:37:50,932][00108] Avg episode reward: [(0, '4.706')] [2023-02-26 15:37:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3386.0). Total num frames: 507904. Throughput: 0: 942.0. Samples: 127408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:37:55,936][00108] Avg episode reward: [(0, '4.589')] [2023-02-26 15:38:00,923][19058] Updated weights for policy 0, policy_version 130 (0.0012) [2023-02-26 15:38:00,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3435.4). Total num frames: 532480. Throughput: 0: 976.0. Samples: 133382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:38:00,931][00108] Avg episode reward: [(0, '4.620')] [2023-02-26 15:38:05,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 993.3. Samples: 136894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 15:38:05,932][00108] Avg episode reward: [(0, '4.652')] [2023-02-26 15:38:10,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 963.9. Samples: 142830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:38:10,932][00108] Avg episode reward: [(0, '4.715')] [2023-02-26 15:38:11,509][19058] Updated weights for policy 0, policy_version 140 (0.0024) [2023-02-26 15:38:15,930][00108] Fps is (10 sec: 3276.5, 60 sec: 3823.0, 300 sec: 3445.4). Total num frames: 585728. Throughput: 0: 934.3. Samples: 147254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:38:15,938][00108] Avg episode reward: [(0, '4.790')] [2023-02-26 15:38:15,940][19044] Saving new best policy, reward=4.790! [2023-02-26 15:38:20,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3464.0). Total num frames: 606208. Throughput: 0: 950.4. Samples: 150178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:38:20,942][00108] Avg episode reward: [(0, '4.740')] [2023-02-26 15:38:22,144][19058] Updated weights for policy 0, policy_version 150 (0.0026) [2023-02-26 15:38:25,929][00108] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 985.9. Samples: 157362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:38:25,935][00108] Avg episode reward: [(0, '4.756')] [2023-02-26 15:38:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3498.2). Total num frames: 647168. Throughput: 0: 955.6. Samples: 163014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:38:30,940][00108] Avg episode reward: [(0, '4.878')] [2023-02-26 15:38:30,951][19044] Saving new best policy, reward=4.878! [2023-02-26 15:38:32,715][19058] Updated weights for policy 0, policy_version 160 (0.0025) [2023-02-26 15:38:35,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 936.9. Samples: 165178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:38:35,936][00108] Avg episode reward: [(0, '4.722')] [2023-02-26 15:38:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3507.9). Total num frames: 684032. Throughput: 0: 970.6. Samples: 171084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:38:40,937][00108] Avg episode reward: [(0, '4.645')] [2023-02-26 15:38:42,766][19058] Updated weights for policy 0, policy_version 170 (0.0013) [2023-02-26 15:38:45,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3543.0). Total num frames: 708608. Throughput: 0: 998.0. Samples: 178294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:38:45,934][00108] Avg episode reward: [(0, '4.825')] [2023-02-26 15:38:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3536.5). Total num frames: 724992. Throughput: 0: 982.6. Samples: 181112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:38:50,933][00108] Avg episode reward: [(0, '4.969')] [2023-02-26 15:38:51,028][19044] Saving new best policy, reward=4.969! [2023-02-26 15:38:53,731][19058] Updated weights for policy 0, policy_version 180 (0.0031) [2023-02-26 15:38:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 951.0. Samples: 185626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:38:55,932][00108] Avg episode reward: [(0, '5.107')] [2023-02-26 15:38:55,935][19044] Saving new best policy, reward=5.107! [2023-02-26 15:39:00,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3543.5). Total num frames: 761856. Throughput: 0: 991.4. Samples: 191864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:00,932][00108] Avg episode reward: [(0, '5.237')] [2023-02-26 15:39:00,962][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000187_765952.pth... [2023-02-26 15:39:01,076][19044] Saving new best policy, reward=5.237! [2023-02-26 15:39:03,587][19058] Updated weights for policy 0, policy_version 190 (0.0017) [2023-02-26 15:39:05,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 1002.4. Samples: 195286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:39:05,931][00108] Avg episode reward: [(0, '5.246')] [2023-02-26 15:39:05,939][19044] Saving new best policy, reward=5.246! [2023-02-26 15:39:10,929][00108] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 972.3. Samples: 201116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:10,932][00108] Avg episode reward: [(0, '5.353')] [2023-02-26 15:39:10,952][19044] Saving new best policy, reward=5.353! [2023-02-26 15:39:15,705][19058] Updated weights for policy 0, policy_version 200 (0.0016) [2023-02-26 15:39:15,929][00108] Fps is (10 sec: 3276.7, 60 sec: 3891.3, 300 sec: 3561.7). Total num frames: 819200. Throughput: 0: 939.9. Samples: 205308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:15,935][00108] Avg episode reward: [(0, '5.422')] [2023-02-26 15:39:15,940][19044] Saving new best policy, reward=5.422! [2023-02-26 15:39:20,930][00108] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3573.1). Total num frames: 839680. Throughput: 0: 962.0. Samples: 208468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:20,935][00108] Avg episode reward: [(0, '5.599')] [2023-02-26 15:39:20,948][19044] Saving new best policy, reward=5.599! [2023-02-26 15:39:24,808][19058] Updated weights for policy 0, policy_version 210 (0.0018) [2023-02-26 15:39:25,929][00108] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3601.1). Total num frames: 864256. Throughput: 0: 986.9. Samples: 215496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:39:25,932][00108] Avg episode reward: [(0, '5.731')] [2023-02-26 15:39:25,939][19044] Saving new best policy, reward=5.731! [2023-02-26 15:39:30,931][00108] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3594.4). Total num frames: 880640. Throughput: 0: 942.4. Samples: 220704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:30,933][00108] Avg episode reward: [(0, '5.552')] [2023-02-26 15:39:35,929][00108] Fps is (10 sec: 2867.1, 60 sec: 3822.9, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 931.2. Samples: 223016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:39:35,935][00108] Avg episode reward: [(0, '5.801')] [2023-02-26 15:39:35,941][19044] Saving new best policy, reward=5.801! [2023-02-26 15:39:37,209][19058] Updated weights for policy 0, policy_version 220 (0.0036) [2023-02-26 15:39:40,929][00108] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3598.1). Total num frames: 917504. Throughput: 0: 963.8. Samples: 228998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:40,932][00108] Avg episode reward: [(0, '5.429')] [2023-02-26 15:39:45,747][19058] Updated weights for policy 0, policy_version 230 (0.0014) [2023-02-26 15:39:45,929][00108] Fps is (10 sec: 4915.3, 60 sec: 3891.2, 300 sec: 3623.4). Total num frames: 942080. Throughput: 0: 981.6. Samples: 236036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:39:45,932][00108] Avg episode reward: [(0, '5.320')] [2023-02-26 15:39:50,934][00108] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3616.8). Total num frames: 958464. Throughput: 0: 962.9. Samples: 238622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:39:50,936][00108] Avg episode reward: [(0, '5.503')] [2023-02-26 15:39:55,929][00108] Fps is (10 sec: 2867.1, 60 sec: 3822.9, 300 sec: 3595.4). Total num frames: 970752. Throughput: 0: 934.7. Samples: 243176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:39:55,936][00108] Avg episode reward: [(0, '5.386')] [2023-02-26 15:39:58,059][19058] Updated weights for policy 0, policy_version 240 (0.0017) [2023-02-26 15:40:00,929][00108] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3619.4). Total num frames: 995328. Throughput: 0: 978.2. Samples: 249328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:40:00,932][00108] Avg episode reward: [(0, '5.298')] [2023-02-26 15:40:05,929][00108] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 1015808. Throughput: 0: 984.7. Samples: 252778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:40:05,932][00108] Avg episode reward: [(0, '5.314')] [2023-02-26 15:40:07,010][19058] Updated weights for policy 0, policy_version 250 (0.0013) [2023-02-26 15:40:10,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3621.7). Total num frames: 1032192. Throughput: 0: 959.4. Samples: 258668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:40:10,938][00108] Avg episode reward: [(0, '5.455')] [2023-02-26 15:40:15,931][00108] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3615.8). Total num frames: 1048576. Throughput: 0: 942.0. Samples: 263094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:40:15,934][00108] Avg episode reward: [(0, '5.510')] [2023-02-26 15:40:19,087][19058] Updated weights for policy 0, policy_version 260 (0.0040) [2023-02-26 15:40:20,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 965.3. Samples: 266456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:40:20,941][00108] Avg episode reward: [(0, '5.419')] [2023-02-26 15:40:25,929][00108] Fps is (10 sec: 4916.3, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 993.4. Samples: 273702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:40:25,938][00108] Avg episode reward: [(0, '5.243')] [2023-02-26 15:40:28,034][19058] Updated weights for policy 0, policy_version 270 (0.0019) [2023-02-26 15:40:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 953.5. Samples: 278942. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:40:30,932][00108] Avg episode reward: [(0, '5.203')] [2023-02-26 15:40:35,929][00108] Fps is (10 sec: 2867.1, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 944.9. Samples: 281138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:40:35,937][00108] Avg episode reward: [(0, '5.753')] [2023-02-26 15:40:39,603][19058] Updated weights for policy 0, policy_version 280 (0.0015) [2023-02-26 15:40:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1150976. Throughput: 0: 983.6. Samples: 287440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:40:40,936][00108] Avg episode reward: [(0, '6.105')] [2023-02-26 15:40:40,948][19044] Saving new best policy, reward=6.105! [2023-02-26 15:40:45,929][00108] Fps is (10 sec: 4915.4, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 1175552. Throughput: 0: 1003.5. Samples: 294486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:40:45,931][00108] Avg episode reward: [(0, '6.479')] [2023-02-26 15:40:45,935][19044] Saving new best policy, reward=6.479! [2023-02-26 15:40:49,630][19058] Updated weights for policy 0, policy_version 290 (0.0020) [2023-02-26 15:40:50,930][00108] Fps is (10 sec: 3685.9, 60 sec: 3823.1, 300 sec: 3873.9). Total num frames: 1187840. Throughput: 0: 979.2. Samples: 296842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:40:50,937][00108] Avg episode reward: [(0, '6.651')] [2023-02-26 15:40:50,951][19044] Saving new best policy, reward=6.651! [2023-02-26 15:40:55,929][00108] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1204224. Throughput: 0: 949.3. Samples: 301388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:40:55,938][00108] Avg episode reward: [(0, '6.615')] [2023-02-26 15:41:00,538][19058] Updated weights for policy 0, policy_version 300 (0.0028) [2023-02-26 15:41:00,929][00108] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1228800. Throughput: 0: 998.7. Samples: 308034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:41:00,936][00108] Avg episode reward: [(0, '6.355')] [2023-02-26 15:41:00,948][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth... [2023-02-26 15:41:01,069][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2023-02-26 15:41:05,930][00108] Fps is (10 sec: 4914.7, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 1253376. Throughput: 0: 1002.6. Samples: 311576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:41:05,935][00108] Avg episode reward: [(0, '6.705')] [2023-02-26 15:41:05,940][19044] Saving new best policy, reward=6.705! [2023-02-26 15:41:10,936][00108] Fps is (10 sec: 3683.7, 60 sec: 3890.7, 300 sec: 3873.8). Total num frames: 1265664. Throughput: 0: 961.8. Samples: 316988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:41:10,939][00108] Avg episode reward: [(0, '6.627')] [2023-02-26 15:41:10,986][19058] Updated weights for policy 0, policy_version 310 (0.0013) [2023-02-26 15:41:15,929][00108] Fps is (10 sec: 2867.5, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 1282048. Throughput: 0: 943.5. Samples: 321400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 15:41:15,932][00108] Avg episode reward: [(0, '6.804')] [2023-02-26 15:41:15,937][19044] Saving new best policy, reward=6.804! [2023-02-26 15:41:20,929][00108] Fps is (10 sec: 4099.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1306624. Throughput: 0: 972.4. Samples: 324894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:41:20,936][00108] Avg episode reward: [(0, '6.889')] [2023-02-26 15:41:20,947][19044] Saving new best policy, reward=6.889! [2023-02-26 15:41:21,567][19058] Updated weights for policy 0, policy_version 320 (0.0012) [2023-02-26 15:41:25,930][00108] Fps is (10 sec: 4914.6, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 1331200. Throughput: 0: 990.8. Samples: 332026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:41:25,932][00108] Avg episode reward: [(0, '7.041')] [2023-02-26 15:41:25,938][19044] Saving new best policy, reward=7.041! [2023-02-26 15:41:30,929][00108] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1343488. Throughput: 0: 942.2. Samples: 336886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:41:30,933][00108] Avg episode reward: [(0, '7.261')] [2023-02-26 15:41:30,956][19044] Saving new best policy, reward=7.261! [2023-02-26 15:41:32,806][19058] Updated weights for policy 0, policy_version 330 (0.0020) [2023-02-26 15:41:35,929][00108] Fps is (10 sec: 2867.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1359872. Throughput: 0: 938.2. Samples: 339060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:41:35,937][00108] Avg episode reward: [(0, '7.508')] [2023-02-26 15:41:35,939][19044] Saving new best policy, reward=7.508! [2023-02-26 15:41:40,929][00108] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1384448. Throughput: 0: 975.4. Samples: 345282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:41:40,935][00108] Avg episode reward: [(0, '8.404')] [2023-02-26 15:41:40,946][19044] Saving new best policy, reward=8.404! [2023-02-26 15:41:42,682][19058] Updated weights for policy 0, policy_version 340 (0.0017) [2023-02-26 15:41:45,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1404928. Throughput: 0: 985.9. Samples: 352398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:41:45,931][00108] Avg episode reward: [(0, '9.737')] [2023-02-26 15:41:45,937][19044] Saving new best policy, reward=9.737! [2023-02-26 15:41:50,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 1421312. Throughput: 0: 956.6. Samples: 354620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:41:50,937][00108] Avg episode reward: [(0, '9.966')] [2023-02-26 15:41:50,950][19044] Saving new best policy, reward=9.966! [2023-02-26 15:41:54,456][19058] Updated weights for policy 0, policy_version 350 (0.0020) [2023-02-26 15:41:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1437696. Throughput: 0: 937.3. Samples: 359160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:41:55,935][00108] Avg episode reward: [(0, '10.216')] [2023-02-26 15:41:55,939][19044] Saving new best policy, reward=10.216! [2023-02-26 15:42:00,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1462272. Throughput: 0: 990.0. Samples: 365948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:42:00,935][00108] Avg episode reward: [(0, '9.115')] [2023-02-26 15:42:04,132][19058] Updated weights for policy 0, policy_version 360 (0.0024) [2023-02-26 15:42:05,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1478656. Throughput: 0: 979.5. Samples: 368972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:42:05,931][00108] Avg episode reward: [(0, '9.080')] [2023-02-26 15:42:10,929][00108] Fps is (10 sec: 2867.2, 60 sec: 3755.1, 300 sec: 3846.1). Total num frames: 1490944. Throughput: 0: 905.2. Samples: 372758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:42:10,936][00108] Avg episode reward: [(0, '8.604')] [2023-02-26 15:42:15,929][00108] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1503232. Throughput: 0: 875.3. Samples: 376276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:42:15,933][00108] Avg episode reward: [(0, '9.205')] [2023-02-26 15:42:19,579][19058] Updated weights for policy 0, policy_version 370 (0.0024) [2023-02-26 15:42:20,929][00108] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 1519616. Throughput: 0: 880.0. Samples: 378660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:42:20,935][00108] Avg episode reward: [(0, '9.895')] [2023-02-26 15:42:25,929][00108] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 1544192. Throughput: 0: 899.6. Samples: 385762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:42:25,932][00108] Avg episode reward: [(0, '10.540')] [2023-02-26 15:42:25,938][19044] Saving new best policy, reward=10.540! [2023-02-26 15:42:28,105][19058] Updated weights for policy 0, policy_version 380 (0.0018) [2023-02-26 15:42:30,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1564672. Throughput: 0: 883.6. Samples: 392158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:42:30,934][00108] Avg episode reward: [(0, '11.291')] [2023-02-26 15:42:30,949][19044] Saving new best policy, reward=11.291! [2023-02-26 15:42:35,932][00108] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3818.3). Total num frames: 1581056. Throughput: 0: 882.8. Samples: 394348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:42:35,935][00108] Avg episode reward: [(0, '11.960')] [2023-02-26 15:42:35,941][19044] Saving new best policy, reward=11.960! [2023-02-26 15:42:40,271][19058] Updated weights for policy 0, policy_version 390 (0.0013) [2023-02-26 15:42:40,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 1597440. Throughput: 0: 893.1. Samples: 399350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:42:40,932][00108] Avg episode reward: [(0, '11.872')] [2023-02-26 15:42:45,929][00108] Fps is (10 sec: 4097.4, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 1622016. Throughput: 0: 893.8. Samples: 406168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:42:45,932][00108] Avg episode reward: [(0, '11.708')] [2023-02-26 15:42:49,319][19058] Updated weights for policy 0, policy_version 400 (0.0019) [2023-02-26 15:42:50,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1642496. Throughput: 0: 903.6. Samples: 409636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:42:50,937][00108] Avg episode reward: [(0, '12.033')] [2023-02-26 15:42:50,955][19044] Saving new best policy, reward=12.033! [2023-02-26 15:42:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1654784. Throughput: 0: 915.7. Samples: 413966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:42:55,935][00108] Avg episode reward: [(0, '11.684')] [2023-02-26 15:43:00,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 1675264. Throughput: 0: 956.2. Samples: 419306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:43:00,935][00108] Avg episode reward: [(0, '12.414')] [2023-02-26 15:43:00,946][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth... [2023-02-26 15:43:01,056][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000187_765952.pth [2023-02-26 15:43:01,075][19044] Saving new best policy, reward=12.414! [2023-02-26 15:43:01,877][19058] Updated weights for policy 0, policy_version 410 (0.0011) [2023-02-26 15:43:05,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 1695744. Throughput: 0: 978.0. Samples: 422668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:43:05,932][00108] Avg episode reward: [(0, '12.574')] [2023-02-26 15:43:05,934][19044] Saving new best policy, reward=12.574! [2023-02-26 15:43:10,929][00108] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1716224. Throughput: 0: 961.4. Samples: 429024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:43:10,932][00108] Avg episode reward: [(0, '13.015')] [2023-02-26 15:43:10,942][19044] Saving new best policy, reward=13.015! [2023-02-26 15:43:12,028][19058] Updated weights for policy 0, policy_version 420 (0.0021) [2023-02-26 15:43:15,929][00108] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 1728512. Throughput: 0: 912.9. Samples: 433238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:43:15,935][00108] Avg episode reward: [(0, '13.690')] [2023-02-26 15:43:15,944][19044] Saving new best policy, reward=13.690! [2023-02-26 15:43:20,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1748992. Throughput: 0: 916.7. Samples: 435596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:43:20,936][00108] Avg episode reward: [(0, '13.084')] [2023-02-26 15:43:23,279][19058] Updated weights for policy 0, policy_version 430 (0.0039) [2023-02-26 15:43:25,929][00108] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1769472. Throughput: 0: 959.1. Samples: 442510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:43:25,935][00108] Avg episode reward: [(0, '13.542')] [2023-02-26 15:43:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1789952. Throughput: 0: 942.7. Samples: 448590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:43:30,935][00108] Avg episode reward: [(0, '13.688')] [2023-02-26 15:43:34,146][19058] Updated weights for policy 0, policy_version 440 (0.0015) [2023-02-26 15:43:35,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3804.4). Total num frames: 1806336. Throughput: 0: 915.6. Samples: 450838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:43:35,936][00108] Avg episode reward: [(0, '13.724')] [2023-02-26 15:43:35,939][19044] Saving new best policy, reward=13.724! [2023-02-26 15:43:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1826816. Throughput: 0: 932.0. Samples: 455904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:43:40,932][00108] Avg episode reward: [(0, '14.036')] [2023-02-26 15:43:40,949][19044] Saving new best policy, reward=14.036! [2023-02-26 15:43:44,449][19058] Updated weights for policy 0, policy_version 450 (0.0012) [2023-02-26 15:43:45,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1847296. Throughput: 0: 970.3. Samples: 462970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:43:45,935][00108] Avg episode reward: [(0, '14.141')] [2023-02-26 15:43:45,940][19044] Saving new best policy, reward=14.141! [2023-02-26 15:43:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1867776. Throughput: 0: 972.4. Samples: 466428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:43:50,936][00108] Avg episode reward: [(0, '15.435')] [2023-02-26 15:43:50,949][19044] Saving new best policy, reward=15.435! [2023-02-26 15:43:55,687][19058] Updated weights for policy 0, policy_version 460 (0.0027) [2023-02-26 15:43:55,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1884160. Throughput: 0: 931.7. Samples: 470950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:43:55,940][00108] Avg episode reward: [(0, '16.133')] [2023-02-26 15:43:55,942][19044] Saving new best policy, reward=16.133! [2023-02-26 15:44:00,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1900544. Throughput: 0: 953.2. Samples: 476134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:44:00,937][00108] Avg episode reward: [(0, '16.715')] [2023-02-26 15:44:00,949][19044] Saving new best policy, reward=16.715! [2023-02-26 15:44:05,519][19058] Updated weights for policy 0, policy_version 470 (0.0023) [2023-02-26 15:44:05,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1925120. Throughput: 0: 979.9. Samples: 479692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:44:05,935][00108] Avg episode reward: [(0, '16.996')] [2023-02-26 15:44:05,940][19044] Saving new best policy, reward=16.996! [2023-02-26 15:44:10,931][00108] Fps is (10 sec: 4504.8, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 1945600. Throughput: 0: 973.7. Samples: 486328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:44:10,935][00108] Avg episode reward: [(0, '16.345')] [2023-02-26 15:44:15,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1957888. Throughput: 0: 938.0. Samples: 490802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:44:15,940][00108] Avg episode reward: [(0, '16.614')] [2023-02-26 15:44:17,426][19058] Updated weights for policy 0, policy_version 480 (0.0015) [2023-02-26 15:44:20,929][00108] Fps is (10 sec: 3277.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1978368. Throughput: 0: 939.3. Samples: 493106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:44:20,935][00108] Avg episode reward: [(0, '17.430')] [2023-02-26 15:44:20,946][19044] Saving new best policy, reward=17.430! [2023-02-26 15:44:25,931][00108] Fps is (10 sec: 4505.0, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 2002944. Throughput: 0: 978.5. Samples: 499938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:44:25,937][00108] Avg episode reward: [(0, '18.155')] [2023-02-26 15:44:25,945][19044] Saving new best policy, reward=18.155! [2023-02-26 15:44:26,973][19058] Updated weights for policy 0, policy_version 490 (0.0019) [2023-02-26 15:44:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2019328. Throughput: 0: 952.4. Samples: 505830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:44:30,937][00108] Avg episode reward: [(0, '18.484')] [2023-02-26 15:44:30,954][19044] Saving new best policy, reward=18.484! [2023-02-26 15:44:35,929][00108] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2035712. Throughput: 0: 922.7. Samples: 507948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:44:35,934][00108] Avg episode reward: [(0, '17.288')] [2023-02-26 15:44:39,343][19058] Updated weights for policy 0, policy_version 500 (0.0032) [2023-02-26 15:44:40,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2052096. Throughput: 0: 937.2. Samples: 513122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:44:40,936][00108] Avg episode reward: [(0, '17.119')] [2023-02-26 15:44:45,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 2076672. Throughput: 0: 979.2. Samples: 520200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:44:45,932][00108] Avg episode reward: [(0, '17.287')] [2023-02-26 15:44:47,996][19058] Updated weights for policy 0, policy_version 510 (0.0014) [2023-02-26 15:44:50,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2097152. Throughput: 0: 975.8. Samples: 523602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:44:50,932][00108] Avg episode reward: [(0, '16.104')] [2023-02-26 15:44:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2109440. Throughput: 0: 925.0. Samples: 527950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:44:55,933][00108] Avg episode reward: [(0, '17.305')] [2023-02-26 15:45:00,350][19058] Updated weights for policy 0, policy_version 520 (0.0017) [2023-02-26 15:45:00,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2129920. Throughput: 0: 948.8. Samples: 533498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:45:00,932][00108] Avg episode reward: [(0, '17.305')] [2023-02-26 15:45:00,942][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth... [2023-02-26 15:45:01,069][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth [2023-02-26 15:45:05,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2154496. Throughput: 0: 975.0. Samples: 536982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:45:05,932][00108] Avg episode reward: [(0, '18.824')] [2023-02-26 15:45:05,941][19044] Saving new best policy, reward=18.824! [2023-02-26 15:45:09,676][19058] Updated weights for policy 0, policy_version 530 (0.0014) [2023-02-26 15:45:10,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 2170880. Throughput: 0: 963.1. Samples: 543276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:45:10,932][00108] Avg episode reward: [(0, '18.428')] [2023-02-26 15:45:15,931][00108] Fps is (10 sec: 3276.1, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 2187264. Throughput: 0: 932.0. Samples: 547774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:45:15,938][00108] Avg episode reward: [(0, '19.734')] [2023-02-26 15:45:15,943][19044] Saving new best policy, reward=19.734! [2023-02-26 15:45:20,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2207744. Throughput: 0: 942.2. Samples: 550346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:45:20,931][00108] Avg episode reward: [(0, '18.923')] [2023-02-26 15:45:21,463][19058] Updated weights for policy 0, policy_version 540 (0.0014) [2023-02-26 15:45:25,929][00108] Fps is (10 sec: 4506.6, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 2232320. Throughput: 0: 985.3. Samples: 557460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:45:25,932][00108] Avg episode reward: [(0, '20.589')] [2023-02-26 15:45:25,934][19044] Saving new best policy, reward=20.589! [2023-02-26 15:45:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2248704. Throughput: 0: 960.2. Samples: 563408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:45:30,937][00108] Avg episode reward: [(0, '19.411')] [2023-02-26 15:45:31,051][19058] Updated weights for policy 0, policy_version 550 (0.0017) [2023-02-26 15:45:35,929][00108] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2265088. Throughput: 0: 934.4. Samples: 565652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 15:45:35,937][00108] Avg episode reward: [(0, '20.662')] [2023-02-26 15:45:35,942][19044] Saving new best policy, reward=20.662! [2023-02-26 15:45:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2285568. Throughput: 0: 958.6. Samples: 571086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:45:40,931][00108] Avg episode reward: [(0, '20.036')] [2023-02-26 15:45:42,167][19058] Updated weights for policy 0, policy_version 560 (0.0021) [2023-02-26 15:45:45,929][00108] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2310144. Throughput: 0: 998.6. Samples: 578436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 15:45:45,936][00108] Avg episode reward: [(0, '20.134')] [2023-02-26 15:45:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2326528. Throughput: 0: 990.4. Samples: 581550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:45:50,934][00108] Avg episode reward: [(0, '19.476')] [2023-02-26 15:45:52,462][19058] Updated weights for policy 0, policy_version 570 (0.0021) [2023-02-26 15:45:55,930][00108] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 2342912. Throughput: 0: 949.0. Samples: 585982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:45:55,937][00108] Avg episode reward: [(0, '20.845')] [2023-02-26 15:45:55,939][19044] Saving new best policy, reward=20.845! [2023-02-26 15:46:00,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2363392. Throughput: 0: 975.2. Samples: 591654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:46:00,932][00108] Avg episode reward: [(0, '19.459')] [2023-02-26 15:46:03,159][19058] Updated weights for policy 0, policy_version 580 (0.0024) [2023-02-26 15:46:05,929][00108] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3804.5). Total num frames: 2387968. Throughput: 0: 995.3. Samples: 595134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:46:05,931][00108] Avg episode reward: [(0, '20.199')] [2023-02-26 15:46:10,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2404352. Throughput: 0: 970.8. Samples: 601144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:46:10,938][00108] Avg episode reward: [(0, '20.732')] [2023-02-26 15:46:14,535][19058] Updated weights for policy 0, policy_version 590 (0.0014) [2023-02-26 15:46:15,930][00108] Fps is (10 sec: 3276.5, 60 sec: 3891.3, 300 sec: 3776.6). Total num frames: 2420736. Throughput: 0: 939.1. Samples: 605668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:46:15,939][00108] Avg episode reward: [(0, '21.755')] [2023-02-26 15:46:15,942][19044] Saving new best policy, reward=21.755! [2023-02-26 15:46:20,929][00108] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2441216. Throughput: 0: 947.1. Samples: 608270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:46:20,936][00108] Avg episode reward: [(0, '21.299')] [2023-02-26 15:46:24,440][19058] Updated weights for policy 0, policy_version 600 (0.0023) [2023-02-26 15:46:25,929][00108] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2461696. Throughput: 0: 983.6. Samples: 615346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:46:25,932][00108] Avg episode reward: [(0, '21.249')] [2023-02-26 15:46:30,929][00108] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2482176. Throughput: 0: 950.0. Samples: 621188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:46:30,935][00108] Avg episode reward: [(0, '21.405')] [2023-02-26 15:46:35,933][00108] Fps is (10 sec: 3275.6, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 2494464. Throughput: 0: 932.1. Samples: 623496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:46:35,936][00108] Avg episode reward: [(0, '19.733')] [2023-02-26 15:46:35,993][19058] Updated weights for policy 0, policy_version 610 (0.0025) [2023-02-26 15:46:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2519040. Throughput: 0: 958.8. Samples: 629126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:46:40,935][00108] Avg episode reward: [(0, '19.198')] [2023-02-26 15:46:44,992][19058] Updated weights for policy 0, policy_version 620 (0.0012) [2023-02-26 15:46:45,929][00108] Fps is (10 sec: 4916.9, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2543616. Throughput: 0: 995.5. Samples: 636452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:46:45,932][00108] Avg episode reward: [(0, '18.190')] [2023-02-26 15:46:50,931][00108] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 2560000. Throughput: 0: 987.1. Samples: 639558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:46:50,934][00108] Avg episode reward: [(0, '18.311')] [2023-02-26 15:46:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2576384. Throughput: 0: 954.6. Samples: 644102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:46:55,932][00108] Avg episode reward: [(0, '19.461')] [2023-02-26 15:46:56,643][19058] Updated weights for policy 0, policy_version 630 (0.0017) [2023-02-26 15:47:00,929][00108] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2596864. Throughput: 0: 986.1. Samples: 650044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:47:00,932][00108] Avg episode reward: [(0, '19.347')] [2023-02-26 15:47:00,946][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000634_2596864.pth... [2023-02-26 15:47:01,054][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth [2023-02-26 15:47:05,772][19058] Updated weights for policy 0, policy_version 640 (0.0011) [2023-02-26 15:47:05,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2621440. Throughput: 0: 1007.5. Samples: 653606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:47:05,935][00108] Avg episode reward: [(0, '19.370')] [2023-02-26 15:47:10,937][00108] Fps is (10 sec: 4092.6, 60 sec: 3890.7, 300 sec: 3846.0). Total num frames: 2637824. Throughput: 0: 985.7. Samples: 659712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:47:10,944][00108] Avg episode reward: [(0, '20.145')] [2023-02-26 15:47:15,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2654208. Throughput: 0: 956.3. Samples: 664220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:47:15,934][00108] Avg episode reward: [(0, '20.165')] [2023-02-26 15:47:18,010][19058] Updated weights for policy 0, policy_version 650 (0.0018) [2023-02-26 15:47:20,929][00108] Fps is (10 sec: 3689.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2674688. Throughput: 0: 967.3. Samples: 667022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:47:20,936][00108] Avg episode reward: [(0, '20.122')] [2023-02-26 15:47:25,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2699264. Throughput: 0: 1002.8. Samples: 674250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:47:25,932][00108] Avg episode reward: [(0, '21.189')] [2023-02-26 15:47:26,561][19058] Updated weights for policy 0, policy_version 660 (0.0031) [2023-02-26 15:47:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2715648. Throughput: 0: 968.9. Samples: 680054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:47:30,933][00108] Avg episode reward: [(0, '22.465')] [2023-02-26 15:47:30,948][19044] Saving new best policy, reward=22.465! [2023-02-26 15:47:35,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 3846.1). Total num frames: 2732032. Throughput: 0: 948.5. Samples: 682238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:47:35,931][00108] Avg episode reward: [(0, '23.799')] [2023-02-26 15:47:35,937][19044] Saving new best policy, reward=23.799! [2023-02-26 15:47:39,033][19058] Updated weights for policy 0, policy_version 670 (0.0022) [2023-02-26 15:47:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2752512. Throughput: 0: 966.6. Samples: 687600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:47:40,935][00108] Avg episode reward: [(0, '24.852')] [2023-02-26 15:47:40,947][19044] Saving new best policy, reward=24.852! [2023-02-26 15:47:45,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2777088. Throughput: 0: 994.5. Samples: 694798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:47:45,931][00108] Avg episode reward: [(0, '22.719')] [2023-02-26 15:47:47,674][19058] Updated weights for policy 0, policy_version 680 (0.0021) [2023-02-26 15:47:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 2793472. Throughput: 0: 984.0. Samples: 697886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:47:50,932][00108] Avg episode reward: [(0, '21.890')] [2023-02-26 15:47:55,930][00108] Fps is (10 sec: 3276.4, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2809856. Throughput: 0: 949.2. Samples: 702420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 15:47:55,932][00108] Avg episode reward: [(0, '21.881')] [2023-02-26 15:47:59,769][19058] Updated weights for policy 0, policy_version 690 (0.0012) [2023-02-26 15:48:00,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2830336. Throughput: 0: 979.7. Samples: 708308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:48:00,931][00108] Avg episode reward: [(0, '20.274')] [2023-02-26 15:48:05,931][00108] Fps is (10 sec: 4095.6, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 2850816. Throughput: 0: 995.1. Samples: 711804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:48:05,939][00108] Avg episode reward: [(0, '20.279')] [2023-02-26 15:48:08,847][19058] Updated weights for policy 0, policy_version 700 (0.0012) [2023-02-26 15:48:10,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.7, 300 sec: 3873.8). Total num frames: 2871296. Throughput: 0: 970.2. Samples: 717910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:48:10,937][00108] Avg episode reward: [(0, '22.016')] [2023-02-26 15:48:15,932][00108] Fps is (10 sec: 3686.2, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 2887680. Throughput: 0: 940.6. Samples: 722382. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 15:48:15,934][00108] Avg episode reward: [(0, '21.729')] [2023-02-26 15:48:20,713][19058] Updated weights for policy 0, policy_version 710 (0.0024) [2023-02-26 15:48:20,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2908160. Throughput: 0: 954.6. Samples: 725194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:48:20,931][00108] Avg episode reward: [(0, '22.130')] [2023-02-26 15:48:25,929][00108] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2928640. Throughput: 0: 995.4. Samples: 732392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:48:25,932][00108] Avg episode reward: [(0, '21.372')] [2023-02-26 15:48:30,563][19058] Updated weights for policy 0, policy_version 720 (0.0017) [2023-02-26 15:48:30,934][00108] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3873.8). Total num frames: 2949120. Throughput: 0: 956.8. Samples: 737858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:48:30,938][00108] Avg episode reward: [(0, '21.686')] [2023-02-26 15:48:35,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2961408. Throughput: 0: 937.3. Samples: 740066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:48:35,937][00108] Avg episode reward: [(0, '21.293')] [2023-02-26 15:48:40,929][00108] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2985984. Throughput: 0: 961.9. Samples: 745706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:48:40,931][00108] Avg episode reward: [(0, '21.662')] [2023-02-26 15:48:41,814][19058] Updated weights for policy 0, policy_version 730 (0.0018) [2023-02-26 15:48:45,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3006464. Throughput: 0: 992.4. Samples: 752966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:48:45,932][00108] Avg episode reward: [(0, '22.452')] [2023-02-26 15:48:50,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3026944. Throughput: 0: 979.6. Samples: 755886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:48:50,933][00108] Avg episode reward: [(0, '22.839')] [2023-02-26 15:48:51,955][19058] Updated weights for policy 0, policy_version 740 (0.0011) [2023-02-26 15:48:55,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3039232. Throughput: 0: 941.1. Samples: 760260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:48:55,934][00108] Avg episode reward: [(0, '22.264')] [2023-02-26 15:49:00,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3059712. Throughput: 0: 974.3. Samples: 766222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:49:00,935][00108] Avg episode reward: [(0, '21.306')] [2023-02-26 15:49:00,948][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth... [2023-02-26 15:49:01,073][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth [2023-02-26 15:49:02,774][19058] Updated weights for policy 0, policy_version 750 (0.0011) [2023-02-26 15:49:05,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 3084288. Throughput: 0: 987.6. Samples: 769638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:49:05,935][00108] Avg episode reward: [(0, '21.204')] [2023-02-26 15:49:10,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3100672. Throughput: 0: 960.3. Samples: 775604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:49:10,934][00108] Avg episode reward: [(0, '21.001')] [2023-02-26 15:49:14,099][19058] Updated weights for policy 0, policy_version 760 (0.0019) [2023-02-26 15:49:15,930][00108] Fps is (10 sec: 3276.5, 60 sec: 3823.1, 300 sec: 3859.9). Total num frames: 3117056. Throughput: 0: 936.1. Samples: 779978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:49:15,934][00108] Avg episode reward: [(0, '19.988')] [2023-02-26 15:49:20,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3137536. Throughput: 0: 954.6. Samples: 783024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:49:20,932][00108] Avg episode reward: [(0, '21.157')] [2023-02-26 15:49:23,648][19058] Updated weights for policy 0, policy_version 770 (0.0016) [2023-02-26 15:49:25,929][00108] Fps is (10 sec: 4506.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3162112. Throughput: 0: 990.7. Samples: 790286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 15:49:25,937][00108] Avg episode reward: [(0, '20.949')] [2023-02-26 15:49:30,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3873.8). Total num frames: 3178496. Throughput: 0: 948.4. Samples: 795644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 15:49:30,934][00108] Avg episode reward: [(0, '20.693')] [2023-02-26 15:49:35,563][19058] Updated weights for policy 0, policy_version 780 (0.0019) [2023-02-26 15:49:35,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3194880. Throughput: 0: 931.4. Samples: 797800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:49:35,938][00108] Avg episode reward: [(0, '20.599')] [2023-02-26 15:49:40,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3215360. Throughput: 0: 967.9. Samples: 803814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:49:40,931][00108] Avg episode reward: [(0, '18.425')] [2023-02-26 15:49:44,654][19058] Updated weights for policy 0, policy_version 790 (0.0014) [2023-02-26 15:49:45,929][00108] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3239936. Throughput: 0: 992.3. Samples: 810876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:49:45,932][00108] Avg episode reward: [(0, '17.829')] [2023-02-26 15:49:50,936][00108] Fps is (10 sec: 4093.3, 60 sec: 3822.5, 300 sec: 3887.6). Total num frames: 3256320. Throughput: 0: 972.3. Samples: 813398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:49:50,941][00108] Avg episode reward: [(0, '19.051')] [2023-02-26 15:49:55,929][00108] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3272704. Throughput: 0: 939.6. Samples: 817888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:49:55,936][00108] Avg episode reward: [(0, '19.546')] [2023-02-26 15:49:56,785][19058] Updated weights for policy 0, policy_version 800 (0.0013) [2023-02-26 15:50:00,932][00108] Fps is (10 sec: 3687.9, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 3293184. Throughput: 0: 988.1. Samples: 824444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:50:00,939][00108] Avg episode reward: [(0, '21.855')] [2023-02-26 15:50:05,362][19058] Updated weights for policy 0, policy_version 810 (0.0013) [2023-02-26 15:50:05,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3317760. Throughput: 0: 997.2. Samples: 827900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:50:05,932][00108] Avg episode reward: [(0, '22.867')] [2023-02-26 15:50:10,932][00108] Fps is (10 sec: 4096.1, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 3334144. Throughput: 0: 955.5. Samples: 833288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:50:10,935][00108] Avg episode reward: [(0, '23.273')] [2023-02-26 15:50:15,932][00108] Fps is (10 sec: 2456.8, 60 sec: 3754.5, 300 sec: 3846.0). Total num frames: 3342336. Throughput: 0: 915.4. Samples: 836838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:50:15,937][00108] Avg episode reward: [(0, '22.637')] [2023-02-26 15:50:20,929][00108] Fps is (10 sec: 2048.5, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 3354624. Throughput: 0: 903.5. Samples: 838458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 15:50:20,935][00108] Avg episode reward: [(0, '23.200')] [2023-02-26 15:50:21,298][19058] Updated weights for policy 0, policy_version 820 (0.0030) [2023-02-26 15:50:25,929][00108] Fps is (10 sec: 3687.6, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 3379200. Throughput: 0: 896.4. Samples: 844152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:50:25,931][00108] Avg episode reward: [(0, '22.038')] [2023-02-26 15:50:30,075][19058] Updated weights for policy 0, policy_version 830 (0.0027) [2023-02-26 15:50:30,929][00108] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3399680. Throughput: 0: 890.4. Samples: 850942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:50:30,936][00108] Avg episode reward: [(0, '20.761')] [2023-02-26 15:50:35,932][00108] Fps is (10 sec: 3685.5, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 3416064. Throughput: 0: 884.3. Samples: 853186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:50:35,934][00108] Avg episode reward: [(0, '19.743')] [2023-02-26 15:50:40,929][00108] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 3432448. Throughput: 0: 884.8. Samples: 857706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:50:40,936][00108] Avg episode reward: [(0, '20.624')] [2023-02-26 15:50:42,218][19058] Updated weights for policy 0, policy_version 840 (0.0020) [2023-02-26 15:50:45,929][00108] Fps is (10 sec: 4097.0, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 3457024. Throughput: 0: 898.9. Samples: 864890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:50:45,936][00108] Avg episode reward: [(0, '21.360')] [2023-02-26 15:50:50,929][00108] Fps is (10 sec: 4505.7, 60 sec: 3686.8, 300 sec: 3846.1). Total num frames: 3477504. Throughput: 0: 900.9. Samples: 868442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:50:50,932][00108] Avg episode reward: [(0, '22.645')] [2023-02-26 15:50:51,488][19058] Updated weights for policy 0, policy_version 850 (0.0017) [2023-02-26 15:50:55,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3493888. Throughput: 0: 890.3. Samples: 873350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:50:55,933][00108] Avg episode reward: [(0, '22.377')] [2023-02-26 15:51:00,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3804.4). Total num frames: 3510272. Throughput: 0: 920.8. Samples: 878270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:51:00,940][00108] Avg episode reward: [(0, '23.541')] [2023-02-26 15:51:00,950][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000857_3510272.pth... [2023-02-26 15:51:01,081][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000634_2596864.pth [2023-02-26 15:51:03,356][19058] Updated weights for policy 0, policy_version 860 (0.0029) [2023-02-26 15:51:05,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 3530752. Throughput: 0: 960.0. Samples: 881656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:51:05,932][00108] Avg episode reward: [(0, '23.118')] [2023-02-26 15:51:10,936][00108] Fps is (10 sec: 4502.7, 60 sec: 3686.2, 300 sec: 3846.0). Total num frames: 3555328. Throughput: 0: 991.2. Samples: 888764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:51:10,940][00108] Avg episode reward: [(0, '22.592')] [2023-02-26 15:51:13,341][19058] Updated weights for policy 0, policy_version 870 (0.0029) [2023-02-26 15:51:15,932][00108] Fps is (10 sec: 3685.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3567616. Throughput: 0: 940.4. Samples: 893264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:51:15,943][00108] Avg episode reward: [(0, '22.471')] [2023-02-26 15:51:20,929][00108] Fps is (10 sec: 2869.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3584000. Throughput: 0: 940.2. Samples: 895492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:51:20,936][00108] Avg episode reward: [(0, '22.351')] [2023-02-26 15:51:24,469][19058] Updated weights for policy 0, policy_version 880 (0.0033) [2023-02-26 15:51:25,929][00108] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3608576. Throughput: 0: 988.2. Samples: 902174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:51:25,937][00108] Avg episode reward: [(0, '21.398')] [2023-02-26 15:51:30,929][00108] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3629056. Throughput: 0: 973.3. Samples: 908690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:51:30,933][00108] Avg episode reward: [(0, '21.586')] [2023-02-26 15:51:35,249][19058] Updated weights for policy 0, policy_version 890 (0.0013) [2023-02-26 15:51:35,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3818.3). Total num frames: 3645440. Throughput: 0: 943.4. Samples: 910896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:51:35,933][00108] Avg episode reward: [(0, '21.874')] [2023-02-26 15:51:40,929][00108] Fps is (10 sec: 3276.9, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 3661824. Throughput: 0: 937.0. Samples: 915516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:51:40,932][00108] Avg episode reward: [(0, '21.567')] [2023-02-26 15:51:45,294][19058] Updated weights for policy 0, policy_version 900 (0.0019) [2023-02-26 15:51:45,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3686400. Throughput: 0: 988.3. Samples: 922744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:51:45,932][00108] Avg episode reward: [(0, '22.728')] [2023-02-26 15:51:50,929][00108] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3706880. Throughput: 0: 992.3. Samples: 926312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:51:50,933][00108] Avg episode reward: [(0, '23.397')] [2023-02-26 15:51:55,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3723264. Throughput: 0: 939.2. Samples: 931024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:51:55,935][00108] Avg episode reward: [(0, '23.825')] [2023-02-26 15:51:56,646][19058] Updated weights for policy 0, policy_version 910 (0.0018) [2023-02-26 15:52:00,929][00108] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3739648. Throughput: 0: 950.9. Samples: 936054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:52:00,932][00108] Avg episode reward: [(0, '23.285')] [2023-02-26 15:52:05,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.4). Total num frames: 3764224. Throughput: 0: 978.9. Samples: 939542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 15:52:05,931][00108] Avg episode reward: [(0, '22.858')] [2023-02-26 15:52:06,354][19058] Updated weights for policy 0, policy_version 920 (0.0012) [2023-02-26 15:52:10,931][00108] Fps is (10 sec: 4505.0, 60 sec: 3823.3, 300 sec: 3832.2). Total num frames: 3784704. Throughput: 0: 975.8. Samples: 946086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:52:10,943][00108] Avg episode reward: [(0, '23.156')] [2023-02-26 15:52:15,930][00108] Fps is (10 sec: 3276.4, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 3796992. Throughput: 0: 925.8. Samples: 950354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:52:15,935][00108] Avg episode reward: [(0, '23.254')] [2023-02-26 15:52:19,093][19058] Updated weights for policy 0, policy_version 930 (0.0019) [2023-02-26 15:52:20,929][00108] Fps is (10 sec: 3277.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3817472. Throughput: 0: 925.6. Samples: 952550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:52:20,936][00108] Avg episode reward: [(0, '22.558')] [2023-02-26 15:52:25,929][00108] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3837952. Throughput: 0: 972.5. Samples: 959280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:52:25,939][00108] Avg episode reward: [(0, '23.509')] [2023-02-26 15:52:28,033][19058] Updated weights for policy 0, policy_version 940 (0.0018) [2023-02-26 15:52:30,930][00108] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3858432. Throughput: 0: 944.5. Samples: 965246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:52:30,935][00108] Avg episode reward: [(0, '25.029')] [2023-02-26 15:52:30,956][19044] Saving new best policy, reward=25.029! [2023-02-26 15:52:35,934][00108] Fps is (10 sec: 3275.4, 60 sec: 3754.4, 300 sec: 3790.5). Total num frames: 3870720. Throughput: 0: 911.0. Samples: 967312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 15:52:35,936][00108] Avg episode reward: [(0, '24.606')] [2023-02-26 15:52:40,767][19058] Updated weights for policy 0, policy_version 950 (0.0034) [2023-02-26 15:52:40,929][00108] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3891200. Throughput: 0: 917.0. Samples: 972288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 15:52:40,932][00108] Avg episode reward: [(0, '23.079')] [2023-02-26 15:52:45,929][00108] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3911680. Throughput: 0: 948.4. Samples: 978730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:52:45,935][00108] Avg episode reward: [(0, '22.342')] [2023-02-26 15:52:50,929][00108] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 3928064. Throughput: 0: 940.1. Samples: 981848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 15:52:50,935][00108] Avg episode reward: [(0, '22.408')] [2023-02-26 15:52:51,345][19058] Updated weights for policy 0, policy_version 960 (0.0024) [2023-02-26 15:52:55,929][00108] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3940352. Throughput: 0: 883.9. Samples: 985862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 15:52:55,932][00108] Avg episode reward: [(0, '22.961')] [2023-02-26 15:53:00,929][00108] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3960832. Throughput: 0: 900.9. Samples: 990892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:53:00,937][00108] Avg episode reward: [(0, '22.975')] [2023-02-26 15:53:00,949][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000967_3960832.pth... [2023-02-26 15:53:01,075][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth [2023-02-26 15:53:03,549][19058] Updated weights for policy 0, policy_version 970 (0.0025) [2023-02-26 15:53:05,929][00108] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3981312. Throughput: 0: 925.6. Samples: 994204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:53:05,939][00108] Avg episode reward: [(0, '23.388')] [2023-02-26 15:53:10,936][00108] Fps is (10 sec: 4093.0, 60 sec: 3617.8, 300 sec: 3776.6). Total num frames: 4001792. Throughput: 0: 913.2. Samples: 1000380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 15:53:10,942][00108] Avg episode reward: [(0, '23.384')] [2023-02-26 15:53:12,281][19044] Stopping Batcher_0... [2023-02-26 15:53:12,282][19044] Loop batcher_evt_loop terminating... [2023-02-26 15:53:12,284][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 15:53:12,281][00108] Component Batcher_0 stopped! [2023-02-26 15:53:12,351][19058] Weights refcount: 2 0 [2023-02-26 15:53:12,372][00108] Component RolloutWorker_w5 stopped! [2023-02-26 15:53:12,371][19064] Stopping RolloutWorker_w5... [2023-02-26 15:53:12,381][19058] Stopping InferenceWorker_p0-w0... [2023-02-26 15:53:12,381][19058] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 15:53:12,381][00108] Component InferenceWorker_p0-w0 stopped! [2023-02-26 15:53:12,390][19064] Loop rollout_proc5_evt_loop terminating... [2023-02-26 15:53:12,408][00108] Component RolloutWorker_w1 stopped! [2023-02-26 15:53:12,411][19060] Stopping RolloutWorker_w1... [2023-02-26 15:53:12,412][19060] Loop rollout_proc1_evt_loop terminating... [2023-02-26 15:53:12,428][00108] Component RolloutWorker_w3 stopped! [2023-02-26 15:53:12,430][19061] Stopping RolloutWorker_w2... [2023-02-26 15:53:12,431][19061] Loop rollout_proc2_evt_loop terminating... [2023-02-26 15:53:12,431][00108] Component RolloutWorker_w2 stopped! [2023-02-26 15:53:12,436][19062] Stopping RolloutWorker_w3... [2023-02-26 15:53:12,439][19062] Loop rollout_proc3_evt_loop terminating... [2023-02-26 15:53:12,452][19066] Stopping RolloutWorker_w7... [2023-02-26 15:53:12,453][19066] Loop rollout_proc7_evt_loop terminating... [2023-02-26 15:53:12,452][00108] Component RolloutWorker_w7 stopped! [2023-02-26 15:53:12,465][00108] Component RolloutWorker_w4 stopped! [2023-02-26 15:53:12,467][19063] Stopping RolloutWorker_w4... [2023-02-26 15:53:12,469][19063] Loop rollout_proc4_evt_loop terminating... [2023-02-26 15:53:12,477][00108] Component RolloutWorker_w6 stopped! [2023-02-26 15:53:12,482][19065] Stopping RolloutWorker_w6... [2023-02-26 15:53:12,482][19065] Loop rollout_proc6_evt_loop terminating... [2023-02-26 15:53:12,501][19044] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000857_3510272.pth [2023-02-26 15:53:12,504][00108] Component RolloutWorker_w0 stopped! [2023-02-26 15:53:12,507][19059] Stopping RolloutWorker_w0... [2023-02-26 15:53:12,514][19059] Loop rollout_proc0_evt_loop terminating... [2023-02-26 15:53:12,518][19044] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 15:53:12,801][00108] Component LearnerWorker_p0 stopped! [2023-02-26 15:53:12,804][00108] Waiting for process learner_proc0 to stop... [2023-02-26 15:53:12,808][19044] Stopping LearnerWorker_p0... [2023-02-26 15:53:12,809][19044] Loop learner_proc0_evt_loop terminating... [2023-02-26 15:53:14,984][00108] Waiting for process inference_proc0-0 to join... [2023-02-26 15:53:15,565][00108] Waiting for process rollout_proc0 to join... [2023-02-26 15:53:16,627][00108] Waiting for process rollout_proc1 to join... [2023-02-26 15:53:16,631][00108] Waiting for process rollout_proc2 to join... [2023-02-26 15:53:16,633][00108] Waiting for process rollout_proc3 to join... [2023-02-26 15:53:16,634][00108] Waiting for process rollout_proc4 to join... [2023-02-26 15:53:16,635][00108] Waiting for process rollout_proc5 to join... [2023-02-26 15:53:16,636][00108] Waiting for process rollout_proc6 to join... [2023-02-26 15:53:16,641][00108] Waiting for process rollout_proc7 to join... [2023-02-26 15:53:16,642][00108] Batcher 0 profile tree view: batching: 25.0036, releasing_batches: 0.0244 [2023-02-26 15:53:16,644][00108] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0047 wait_policy_total: 524.1958 update_model: 7.3770 weight_update: 0.0016 one_step: 0.0025 handle_policy_step: 492.3900 deserialize: 14.4398, stack: 2.7594, obs_to_device_normalize: 109.6511, forward: 235.8743, send_messages: 25.4736 prepare_outputs: 79.2628 to_cpu: 49.2762 [2023-02-26 15:53:16,645][00108] Learner 0 profile tree view: misc: 0.0060, prepare_batch: 15.0915 train: 74.7596 epoch_init: 0.0170, minibatch_init: 0.0072, losses_postprocess: 0.5636, kl_divergence: 0.5629, after_optimizer: 32.9399 calculate_losses: 26.4199 losses_init: 0.0104, forward_head: 1.6640, bptt_initial: 17.4652, tail: 0.9630, advantages_returns: 0.2915, losses: 3.5520 bptt: 2.1364 bptt_forward_core: 2.0796 update: 13.6765 clip: 1.3928 [2023-02-26 15:53:16,650][00108] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4015, enqueue_policy_requests: 137.6260, env_step: 803.0299, overhead: 20.2798, complete_rollouts: 7.2303 save_policy_outputs: 19.3645 split_output_tensors: 9.2609 [2023-02-26 15:53:16,653][00108] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3278, enqueue_policy_requests: 142.5598, env_step: 798.7355, overhead: 19.5091, complete_rollouts: 6.3191 save_policy_outputs: 18.8027 split_output_tensors: 9.0417 [2023-02-26 15:53:16,655][00108] Loop Runner_EvtLoop terminating... [2023-02-26 15:53:16,657][00108] Runner profile tree view: main_loop: 1089.9621 [2023-02-26 15:53:16,658][00108] Collected {0: 4005888}, FPS: 3675.3 [2023-02-26 15:53:28,074][00108] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 15:53:28,079][00108] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 15:53:28,081][00108] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 15:53:28,085][00108] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 15:53:28,089][00108] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 15:53:28,091][00108] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 15:53:28,095][00108] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 15:53:28,096][00108] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 15:53:28,097][00108] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-26 15:53:28,099][00108] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-26 15:53:28,100][00108] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 15:53:28,102][00108] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 15:53:28,103][00108] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 15:53:28,105][00108] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 15:53:28,106][00108] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 15:53:28,150][00108] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 15:53:28,155][00108] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 15:53:28,158][00108] RunningMeanStd input shape: (1,) [2023-02-26 15:53:28,185][00108] ConvEncoder: input_channels=3 [2023-02-26 15:53:28,954][00108] Conv encoder output size: 512 [2023-02-26 15:53:28,957][00108] Policy head output size: 512 [2023-02-26 15:53:31,721][00108] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 15:53:32,944][00108] Num frames 100... [2023-02-26 15:53:33,059][00108] Num frames 200... [2023-02-26 15:53:33,170][00108] Num frames 300... [2023-02-26 15:53:33,286][00108] Num frames 400... [2023-02-26 15:53:33,399][00108] Num frames 500... [2023-02-26 15:53:33,515][00108] Num frames 600... [2023-02-26 15:53:33,629][00108] Num frames 700... [2023-02-26 15:53:33,749][00108] Num frames 800... [2023-02-26 15:53:33,868][00108] Num frames 900... [2023-02-26 15:53:33,980][00108] Num frames 1000... [2023-02-26 15:53:34,101][00108] Num frames 1100... [2023-02-26 15:53:34,216][00108] Num frames 1200... [2023-02-26 15:53:34,334][00108] Num frames 1300... [2023-02-26 15:53:34,452][00108] Num frames 1400... [2023-02-26 15:53:34,568][00108] Num frames 1500... [2023-02-26 15:53:34,691][00108] Num frames 1600... [2023-02-26 15:53:34,808][00108] Num frames 1700... [2023-02-26 15:53:34,921][00108] Num frames 1800... [2023-02-26 15:53:35,062][00108] Avg episode rewards: #0: 44.709, true rewards: #0: 18.710 [2023-02-26 15:53:35,064][00108] Avg episode reward: 44.709, avg true_objective: 18.710 [2023-02-26 15:53:35,099][00108] Num frames 1900... [2023-02-26 15:53:35,213][00108] Num frames 2000... [2023-02-26 15:53:35,335][00108] Num frames 2100... [2023-02-26 15:53:35,451][00108] Num frames 2200... [2023-02-26 15:53:35,565][00108] Num frames 2300... [2023-02-26 15:53:35,686][00108] Num frames 2400... [2023-02-26 15:53:35,800][00108] Num frames 2500... [2023-02-26 15:53:35,917][00108] Num frames 2600... [2023-02-26 15:53:36,035][00108] Num frames 2700... [2023-02-26 15:53:36,150][00108] Num frames 2800... [2023-02-26 15:53:36,267][00108] Num frames 2900... [2023-02-26 15:53:36,387][00108] Num frames 3000... [2023-02-26 15:53:36,502][00108] Num frames 3100... [2023-02-26 15:53:36,616][00108] Num frames 3200... [2023-02-26 15:53:36,735][00108] Num frames 3300... [2023-02-26 15:53:36,856][00108] Num frames 3400... [2023-02-26 15:53:36,968][00108] Num frames 3500... [2023-02-26 15:53:37,093][00108] Num frames 3600... [2023-02-26 15:53:37,223][00108] Num frames 3700... [2023-02-26 15:53:37,345][00108] Num frames 3800... [2023-02-26 15:53:37,469][00108] Num frames 3900... [2023-02-26 15:53:37,606][00108] Avg episode rewards: #0: 49.854, true rewards: #0: 19.855 [2023-02-26 15:53:37,608][00108] Avg episode reward: 49.854, avg true_objective: 19.855 [2023-02-26 15:53:37,646][00108] Num frames 4000... [2023-02-26 15:53:37,766][00108] Num frames 4100... [2023-02-26 15:53:37,886][00108] Num frames 4200... [2023-02-26 15:53:37,998][00108] Num frames 4300... [2023-02-26 15:53:38,119][00108] Num frames 4400... [2023-02-26 15:53:38,270][00108] Avg episode rewards: #0: 35.943, true rewards: #0: 14.943 [2023-02-26 15:53:38,271][00108] Avg episode reward: 35.943, avg true_objective: 14.943 [2023-02-26 15:53:38,296][00108] Num frames 4500... [2023-02-26 15:53:38,413][00108] Num frames 4600... [2023-02-26 15:53:38,528][00108] Num frames 4700... [2023-02-26 15:53:38,654][00108] Num frames 4800... [2023-02-26 15:53:38,777][00108] Avg episode rewards: #0: 28.895, true rewards: #0: 12.145 [2023-02-26 15:53:38,778][00108] Avg episode reward: 28.895, avg true_objective: 12.145 [2023-02-26 15:53:38,830][00108] Num frames 4900... [2023-02-26 15:53:38,949][00108] Num frames 5000... [2023-02-26 15:53:39,070][00108] Num frames 5100... [2023-02-26 15:53:39,181][00108] Num frames 5200... [2023-02-26 15:53:39,295][00108] Num frames 5300... [2023-02-26 15:53:39,413][00108] Num frames 5400... [2023-02-26 15:53:39,530][00108] Num frames 5500... [2023-02-26 15:53:39,645][00108] Num frames 5600... [2023-02-26 15:53:39,763][00108] Num frames 5700... [2023-02-26 15:53:39,882][00108] Num frames 5800... [2023-02-26 15:53:40,014][00108] Num frames 5900... [2023-02-26 15:53:40,133][00108] Num frames 6000... [2023-02-26 15:53:40,248][00108] Num frames 6100... [2023-02-26 15:53:40,362][00108] Num frames 6200... [2023-02-26 15:53:40,482][00108] Num frames 6300... [2023-02-26 15:53:40,629][00108] Num frames 6400... [2023-02-26 15:53:40,789][00108] Num frames 6500... [2023-02-26 15:53:40,955][00108] Num frames 6600... [2023-02-26 15:53:41,113][00108] Num frames 6700... [2023-02-26 15:53:41,283][00108] Num frames 6800... [2023-02-26 15:53:41,451][00108] Num frames 6900... [2023-02-26 15:53:41,606][00108] Avg episode rewards: #0: 33.715, true rewards: #0: 13.916 [2023-02-26 15:53:41,609][00108] Avg episode reward: 33.715, avg true_objective: 13.916 [2023-02-26 15:53:41,683][00108] Num frames 7000... [2023-02-26 15:53:41,845][00108] Num frames 7100... [2023-02-26 15:53:42,010][00108] Num frames 7200... [2023-02-26 15:53:42,171][00108] Num frames 7300... [2023-02-26 15:53:42,343][00108] Num frames 7400... [2023-02-26 15:53:42,509][00108] Num frames 7500... [2023-02-26 15:53:42,637][00108] Avg episode rewards: #0: 29.903, true rewards: #0: 12.570 [2023-02-26 15:53:42,640][00108] Avg episode reward: 29.903, avg true_objective: 12.570 [2023-02-26 15:53:42,735][00108] Num frames 7600... [2023-02-26 15:53:42,901][00108] Num frames 7700... [2023-02-26 15:53:43,072][00108] Num frames 7800... [2023-02-26 15:53:43,240][00108] Num frames 7900... [2023-02-26 15:53:43,340][00108] Avg episode rewards: #0: 26.608, true rewards: #0: 11.323 [2023-02-26 15:53:43,343][00108] Avg episode reward: 26.608, avg true_objective: 11.323 [2023-02-26 15:53:43,469][00108] Num frames 8000... [2023-02-26 15:53:43,631][00108] Num frames 8100... [2023-02-26 15:53:43,788][00108] Num frames 8200... [2023-02-26 15:53:43,953][00108] Num frames 8300... [2023-02-26 15:53:44,090][00108] Num frames 8400... [2023-02-26 15:53:44,209][00108] Num frames 8500... [2023-02-26 15:53:44,323][00108] Num frames 8600... [2023-02-26 15:53:44,442][00108] Num frames 8700... [2023-02-26 15:53:44,563][00108] Num frames 8800... [2023-02-26 15:53:44,677][00108] Num frames 8900... [2023-02-26 15:53:44,789][00108] Num frames 9000... [2023-02-26 15:53:44,912][00108] Num frames 9100... [2023-02-26 15:53:45,027][00108] Num frames 9200... [2023-02-26 15:53:45,147][00108] Num frames 9300... [2023-02-26 15:53:45,262][00108] Num frames 9400... [2023-02-26 15:53:45,388][00108] Avg episode rewards: #0: 28.452, true rewards: #0: 11.828 [2023-02-26 15:53:45,390][00108] Avg episode reward: 28.452, avg true_objective: 11.828 [2023-02-26 15:53:45,440][00108] Num frames 9500... [2023-02-26 15:53:45,551][00108] Num frames 9600... [2023-02-26 15:53:45,670][00108] Num frames 9700... [2023-02-26 15:53:45,783][00108] Num frames 9800... [2023-02-26 15:53:45,897][00108] Num frames 9900... [2023-02-26 15:53:46,018][00108] Num frames 10000... [2023-02-26 15:53:46,138][00108] Num frames 10100... [2023-02-26 15:53:46,252][00108] Num frames 10200... [2023-02-26 15:53:46,374][00108] Num frames 10300... [2023-02-26 15:53:46,492][00108] Num frames 10400... [2023-02-26 15:53:46,646][00108] Avg episode rewards: #0: 27.873, true rewards: #0: 11.651 [2023-02-26 15:53:46,648][00108] Avg episode reward: 27.873, avg true_objective: 11.651 [2023-02-26 15:53:46,669][00108] Num frames 10500... [2023-02-26 15:53:46,785][00108] Num frames 10600... [2023-02-26 15:53:46,908][00108] Num frames 10700... [2023-02-26 15:53:47,030][00108] Num frames 10800... [2023-02-26 15:53:47,154][00108] Num frames 10900... [2023-02-26 15:53:47,267][00108] Num frames 11000... [2023-02-26 15:53:47,384][00108] Num frames 11100... [2023-02-26 15:53:47,499][00108] Num frames 11200... [2023-02-26 15:53:47,610][00108] Num frames 11300... [2023-02-26 15:53:47,723][00108] Num frames 11400... [2023-02-26 15:53:47,837][00108] Num frames 11500... [2023-02-26 15:53:47,948][00108] Num frames 11600... [2023-02-26 15:53:48,065][00108] Num frames 11700... [2023-02-26 15:53:48,184][00108] Num frames 11800... [2023-02-26 15:53:48,295][00108] Num frames 11900... [2023-02-26 15:53:48,414][00108] Num frames 12000... [2023-02-26 15:53:48,528][00108] Num frames 12100... [2023-02-26 15:53:48,639][00108] Num frames 12200... [2023-02-26 15:53:48,757][00108] Num frames 12300... [2023-02-26 15:53:48,868][00108] Num frames 12400... [2023-02-26 15:53:48,996][00108] Avg episode rewards: #0: 29.958, true rewards: #0: 12.458 [2023-02-26 15:53:48,999][00108] Avg episode reward: 29.958, avg true_objective: 12.458 [2023-02-26 15:55:02,608][00108] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 15:57:24,931][00108] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 15:57:24,934][00108] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 15:57:24,936][00108] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 15:57:24,939][00108] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 15:57:24,942][00108] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 15:57:24,943][00108] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 15:57:24,946][00108] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 15:57:24,948][00108] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 15:57:24,949][00108] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 15:57:24,951][00108] Adding new argument 'hf_repository'='newbie4000/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 15:57:24,952][00108] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 15:57:24,953][00108] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 15:57:24,955][00108] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 15:57:24,956][00108] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 15:57:24,958][00108] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 15:57:24,986][00108] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 15:57:24,987][00108] RunningMeanStd input shape: (1,) [2023-02-26 15:57:25,002][00108] ConvEncoder: input_channels=3 [2023-02-26 15:57:25,037][00108] Conv encoder output size: 512 [2023-02-26 15:57:25,039][00108] Policy head output size: 512 [2023-02-26 15:57:25,059][00108] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 15:57:25,495][00108] Num frames 100... [2023-02-26 15:57:25,610][00108] Num frames 200... [2023-02-26 15:57:25,724][00108] Num frames 300... [2023-02-26 15:57:25,836][00108] Num frames 400... [2023-02-26 15:57:25,955][00108] Num frames 500... [2023-02-26 15:57:26,075][00108] Num frames 600... [2023-02-26 15:57:26,195][00108] Num frames 700... [2023-02-26 15:57:26,307][00108] Num frames 800... [2023-02-26 15:57:26,427][00108] Num frames 900... [2023-02-26 15:57:26,540][00108] Num frames 1000... [2023-02-26 15:57:26,650][00108] Num frames 1100... [2023-02-26 15:57:26,765][00108] Avg episode rewards: #0: 24.520, true rewards: #0: 11.520 [2023-02-26 15:57:26,768][00108] Avg episode reward: 24.520, avg true_objective: 11.520 [2023-02-26 15:57:26,825][00108] Num frames 1200... [2023-02-26 15:57:26,944][00108] Num frames 1300... [2023-02-26 15:57:27,061][00108] Num frames 1400... [2023-02-26 15:57:27,174][00108] Num frames 1500... [2023-02-26 15:57:27,288][00108] Num frames 1600... [2023-02-26 15:57:27,400][00108] Num frames 1700... [2023-02-26 15:57:27,512][00108] Num frames 1800... [2023-02-26 15:57:27,622][00108] Num frames 1900... [2023-02-26 15:57:27,735][00108] Num frames 2000... [2023-02-26 15:57:27,846][00108] Num frames 2100... [2023-02-26 15:57:27,972][00108] Num frames 2200... [2023-02-26 15:57:28,086][00108] Num frames 2300... [2023-02-26 15:57:28,207][00108] Num frames 2400... [2023-02-26 15:57:28,320][00108] Num frames 2500... [2023-02-26 15:57:28,445][00108] Num frames 2600... [2023-02-26 15:57:28,562][00108] Avg episode rewards: #0: 29.780, true rewards: #0: 13.280 [2023-02-26 15:57:28,563][00108] Avg episode reward: 29.780, avg true_objective: 13.280 [2023-02-26 15:57:28,618][00108] Num frames 2700... [2023-02-26 15:57:28,730][00108] Num frames 2800... [2023-02-26 15:57:28,846][00108] Num frames 2900... [2023-02-26 15:57:28,976][00108] Num frames 3000... [2023-02-26 15:57:29,102][00108] Num frames 3100... [2023-02-26 15:57:29,220][00108] Num frames 3200... [2023-02-26 15:57:29,334][00108] Num frames 3300... [2023-02-26 15:57:29,454][00108] Num frames 3400... [2023-02-26 15:57:29,566][00108] Num frames 3500... [2023-02-26 15:57:29,686][00108] Num frames 3600... [2023-02-26 15:57:29,804][00108] Num frames 3700... [2023-02-26 15:57:29,919][00108] Num frames 3800... [2023-02-26 15:57:30,053][00108] Num frames 3900... [2023-02-26 15:57:30,166][00108] Num frames 4000... [2023-02-26 15:57:30,281][00108] Num frames 4100... [2023-02-26 15:57:30,402][00108] Num frames 4200... [2023-02-26 15:57:30,518][00108] Num frames 4300... [2023-02-26 15:57:30,633][00108] Num frames 4400... [2023-02-26 15:57:30,743][00108] Num frames 4500... [2023-02-26 15:57:30,856][00108] Num frames 4600... [2023-02-26 15:57:30,974][00108] Num frames 4700... [2023-02-26 15:57:31,075][00108] Avg episode rewards: #0: 38.440, true rewards: #0: 15.773 [2023-02-26 15:57:31,077][00108] Avg episode reward: 38.440, avg true_objective: 15.773 [2023-02-26 15:57:31,153][00108] Num frames 4800... [2023-02-26 15:57:31,269][00108] Num frames 4900... [2023-02-26 15:57:31,382][00108] Num frames 5000... [2023-02-26 15:57:31,493][00108] Num frames 5100... [2023-02-26 15:57:31,607][00108] Num frames 5200... [2023-02-26 15:57:31,720][00108] Num frames 5300... [2023-02-26 15:57:31,830][00108] Num frames 5400... [2023-02-26 15:57:31,948][00108] Num frames 5500... [2023-02-26 15:57:32,068][00108] Num frames 5600... [2023-02-26 15:57:32,182][00108] Num frames 5700... [2023-02-26 15:57:32,290][00108] Num frames 5800... [2023-02-26 15:57:32,407][00108] Num frames 5900... [2023-02-26 15:57:32,522][00108] Num frames 6000... [2023-02-26 15:57:32,634][00108] Num frames 6100... [2023-02-26 15:57:32,753][00108] Num frames 6200... [2023-02-26 15:57:32,849][00108] Avg episode rewards: #0: 36.840, true rewards: #0: 15.590 [2023-02-26 15:57:32,850][00108] Avg episode reward: 36.840, avg true_objective: 15.590 [2023-02-26 15:57:32,925][00108] Num frames 6300... [2023-02-26 15:57:33,046][00108] Num frames 6400... [2023-02-26 15:57:33,163][00108] Num frames 6500... [2023-02-26 15:57:33,323][00108] Avg episode rewards: #0: 30.190, true rewards: #0: 13.190 [2023-02-26 15:57:33,326][00108] Avg episode reward: 30.190, avg true_objective: 13.190 [2023-02-26 15:57:33,335][00108] Num frames 6600... [2023-02-26 15:57:33,447][00108] Num frames 6700... [2023-02-26 15:57:33,565][00108] Num frames 6800... [2023-02-26 15:57:33,675][00108] Num frames 6900... [2023-02-26 15:57:33,800][00108] Num frames 7000... [2023-02-26 15:57:33,958][00108] Num frames 7100... [2023-02-26 15:57:34,125][00108] Num frames 7200... [2023-02-26 15:57:34,281][00108] Num frames 7300... [2023-02-26 15:57:34,436][00108] Num frames 7400... [2023-02-26 15:57:34,596][00108] Num frames 7500... [2023-02-26 15:57:34,757][00108] Num frames 7600... [2023-02-26 15:57:34,913][00108] Num frames 7700... [2023-02-26 15:57:35,090][00108] Avg episode rewards: #0: 29.298, true rewards: #0: 12.965 [2023-02-26 15:57:35,091][00108] Avg episode reward: 29.298, avg true_objective: 12.965 [2023-02-26 15:57:35,128][00108] Num frames 7800... [2023-02-26 15:57:35,287][00108] Num frames 7900... [2023-02-26 15:57:35,446][00108] Num frames 8000... [2023-02-26 15:57:35,600][00108] Num frames 8100... [2023-02-26 15:57:35,767][00108] Num frames 8200... [2023-02-26 15:57:35,916][00108] Num frames 8300... [2023-02-26 15:57:36,071][00108] Num frames 8400... [2023-02-26 15:57:36,240][00108] Num frames 8500... [2023-02-26 15:57:36,320][00108] Avg episode rewards: #0: 27.021, true rewards: #0: 12.164 [2023-02-26 15:57:36,322][00108] Avg episode reward: 27.021, avg true_objective: 12.164 [2023-02-26 15:57:36,455][00108] Num frames 8600... [2023-02-26 15:57:36,614][00108] Num frames 8700... [2023-02-26 15:57:36,773][00108] Num frames 8800... [2023-02-26 15:57:36,940][00108] Avg episode rewards: #0: 24.209, true rewards: #0: 11.084 [2023-02-26 15:57:36,942][00108] Avg episode reward: 24.209, avg true_objective: 11.084 [2023-02-26 15:57:37,003][00108] Num frames 8900... [2023-02-26 15:57:37,164][00108] Num frames 9000... [2023-02-26 15:57:37,321][00108] Num frames 9100... [2023-02-26 15:57:37,438][00108] Num frames 9200... [2023-02-26 15:57:37,550][00108] Num frames 9300... [2023-02-26 15:57:37,666][00108] Num frames 9400... [2023-02-26 15:57:37,782][00108] Num frames 9500... [2023-02-26 15:57:37,928][00108] Avg episode rewards: #0: 22.979, true rewards: #0: 10.646 [2023-02-26 15:57:37,929][00108] Avg episode reward: 22.979, avg true_objective: 10.646 [2023-02-26 15:57:37,958][00108] Num frames 9600... [2023-02-26 15:57:38,070][00108] Num frames 9700... [2023-02-26 15:57:38,186][00108] Num frames 9800... [2023-02-26 15:57:38,300][00108] Num frames 9900... [2023-02-26 15:57:38,411][00108] Num frames 10000... [2023-02-26 15:57:38,525][00108] Num frames 10100... [2023-02-26 15:57:38,650][00108] Num frames 10200... [2023-02-26 15:57:38,759][00108] Num frames 10300... [2023-02-26 15:57:38,867][00108] Num frames 10400... [2023-02-26 15:57:38,976][00108] Num frames 10500... [2023-02-26 15:57:39,086][00108] Num frames 10600... [2023-02-26 15:57:39,183][00108] Avg episode rewards: #0: 22.637, true rewards: #0: 10.637 [2023-02-26 15:57:39,185][00108] Avg episode reward: 22.637, avg true_objective: 10.637 [2023-02-26 15:58:42,870][00108] Replay video saved to /content/train_dir/default_experiment/replay.mp4!