[2024-09-30 00:25:00,956][1148693] Saving configuration to /home/luyang/workspace/rl/train_dir/default_experiment/config.json... [2024-09-30 00:25:00,961][1148693] Rollout worker 0 uses device cpu [2024-09-30 00:25:00,961][1148693] Rollout worker 1 uses device cpu [2024-09-30 00:25:00,961][1148693] Rollout worker 2 uses device cpu [2024-09-30 00:25:00,961][1148693] Rollout worker 3 uses device cpu [2024-09-30 00:25:00,961][1148693] Rollout worker 4 uses device cpu [2024-09-30 00:25:00,961][1148693] Rollout worker 5 uses device cpu [2024-09-30 00:25:00,961][1148693] Rollout worker 6 uses device cpu [2024-09-30 00:25:00,962][1148693] Rollout worker 7 uses device cpu [2024-09-30 00:25:01,008][1148693] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:25:01,008][1148693] InferenceWorker_p0-w0: min num requests: 2 [2024-09-30 00:25:01,042][1148693] Starting all processes... [2024-09-30 00:25:01,042][1148693] Starting process learner_proc0 [2024-09-30 00:25:02,676][1148693] Starting all processes... [2024-09-30 00:25:02,680][1148981] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:25:02,680][1148981] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-30 00:25:02,680][1148693] Starting process inference_proc0-0 [2024-09-30 00:25:02,680][1148693] Starting process rollout_proc0 [2024-09-30 00:25:02,681][1148693] Starting process rollout_proc1 [2024-09-30 00:25:02,681][1148693] Starting process rollout_proc2 [2024-09-30 00:25:02,681][1148693] Starting process rollout_proc3 [2024-09-30 00:25:02,681][1148693] Starting process rollout_proc4 [2024-09-30 00:25:02,681][1148693] Starting process rollout_proc5 [2024-09-30 00:25:02,686][1148693] Starting process rollout_proc6 [2024-09-30 00:25:02,686][1148693] Starting process rollout_proc7 [2024-09-30 00:25:02,712][1148981] Num visible devices: 1 [2024-09-30 00:25:02,719][1148981] Starting seed is not provided [2024-09-30 00:25:02,719][1148981] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:25:02,719][1148981] Initializing actor-critic model on device cuda:0 [2024-09-30 00:25:02,719][1148981] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 00:25:02,720][1148981] RunningMeanStd input shape: (1,) [2024-09-30 00:25:02,729][1148981] ConvEncoder: input_channels=3 [2024-09-30 00:25:02,801][1148981] Conv encoder output size: 512 [2024-09-30 00:25:02,801][1148981] Policy head output size: 512 [2024-09-30 00:25:02,812][1148981] Created Actor Critic model with architecture: [2024-09-30 00:25:02,813][1148981] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-30 00:25:02,951][1148981] Using optimizer [2024-09-30 00:25:03,366][1148693] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 1148693], exiting... [2024-09-30 00:25:03,366][1148693] Runner profile tree view: main_loop: 2.3244 [2024-09-30 00:25:03,367][1148693] Collected {}, FPS: 0.0 [2024-09-30 00:25:03,367][1148981] Stopping Batcher_0... [2024-09-30 00:25:03,368][1148981] Loop batcher_evt_loop terminating... [2024-09-30 00:25:03,637][1148981] No checkpoints found [2024-09-30 00:25:03,637][1148981] Did not load from checkpoint, starting from scratch! [2024-09-30 00:25:03,637][1148981] Initialized policy 0 weights for model version 0 [2024-09-30 00:25:03,639][1148981] LearnerWorker_p0 finished initialization! [2024-09-30 00:25:03,640][1148981] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2024-09-30 00:25:03,662][1148981] Stopping LearnerWorker_p0... [2024-09-30 00:25:03,662][1148981] Loop learner_proc0_evt_loop terminating... [2024-09-30 00:26:16,204][1149865] Saving configuration to /home/luyang/workspace/rl/train_dir/default_experiment/config.json... [2024-09-30 00:26:16,209][1149865] Rollout worker 0 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 1 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 2 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 3 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 4 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 5 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 6 uses device cpu [2024-09-30 00:26:16,209][1149865] Rollout worker 7 uses device cpu [2024-09-30 00:26:16,252][1149865] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:26:16,252][1149865] InferenceWorker_p0-w0: min num requests: 2 [2024-09-30 00:26:16,286][1149865] Starting all processes... [2024-09-30 00:26:16,286][1149865] Starting process learner_proc0 [2024-09-30 00:26:17,897][1149865] Starting all processes... [2024-09-30 00:26:17,901][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:26:17,901][1150061] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-30 00:26:17,901][1149865] Starting process inference_proc0-0 [2024-09-30 00:26:17,901][1149865] Starting process rollout_proc0 [2024-09-30 00:26:17,901][1149865] Starting process rollout_proc1 [2024-09-30 00:26:17,902][1149865] Starting process rollout_proc2 [2024-09-30 00:26:17,902][1149865] Starting process rollout_proc3 [2024-09-30 00:26:17,902][1149865] Starting process rollout_proc4 [2024-09-30 00:26:17,902][1149865] Starting process rollout_proc5 [2024-09-30 00:26:17,902][1149865] Starting process rollout_proc6 [2024-09-30 00:26:17,903][1149865] Starting process rollout_proc7 [2024-09-30 00:26:17,953][1150061] Num visible devices: 1 [2024-09-30 00:26:17,959][1150061] Starting seed is not provided [2024-09-30 00:26:17,959][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:26:17,959][1150061] Initializing actor-critic model on device cuda:0 [2024-09-30 00:26:17,959][1150061] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 00:26:17,960][1150061] RunningMeanStd input shape: (1,) [2024-09-30 00:26:17,968][1150061] ConvEncoder: input_channels=3 [2024-09-30 00:26:18,041][1150061] Conv encoder output size: 512 [2024-09-30 00:26:18,041][1150061] Policy head output size: 512 [2024-09-30 00:26:18,052][1150061] Created Actor Critic model with architecture: [2024-09-30 00:26:18,052][1150061] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-30 00:26:18,183][1150061] Using optimizer [2024-09-30 00:26:18,816][1150061] Loading state from checkpoint /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... [2024-09-30 00:26:18,828][1150061] Loading model from checkpoint [2024-09-30 00:26:18,829][1150061] Loaded experiment state at self.train_step=0, self.env_steps=0 [2024-09-30 00:26:18,829][1150061] Initialized policy 0 weights for model version 0 [2024-09-30 00:26:18,831][1150061] LearnerWorker_p0 finished initialization! [2024-09-30 00:26:18,831][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:26:19,422][1150142] Worker 3 uses CPU cores [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47] [2024-09-30 00:26:19,449][1150140] Worker 7 uses CPU cores [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] [2024-09-30 00:26:19,451][1150144] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2024-09-30 00:26:19,456][1150145] Worker 6 uses CPU cores [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83] [2024-09-30 00:26:19,456][1150137] Worker 5 uses CPU cores [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71] [2024-09-30 00:26:19,462][1150141] Worker 4 uses CPU cores [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59] [2024-09-30 00:26:19,465][1150143] Worker 1 uses CPU cores [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2024-09-30 00:26:19,466][1150138] Worker 2 uses CPU cores [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35] [2024-09-30 00:26:19,483][1150139] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 00:26:19,484][1150139] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-30 00:26:19,545][1150139] Num visible devices: 1 [2024-09-30 00:26:19,557][1149865] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-30 00:26:19,639][1150139] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 00:26:19,640][1150139] RunningMeanStd input shape: (1,) [2024-09-30 00:26:19,648][1150139] ConvEncoder: input_channels=3 [2024-09-30 00:26:19,720][1150139] Conv encoder output size: 512 [2024-09-30 00:26:19,720][1150139] Policy head output size: 512 [2024-09-30 00:26:19,751][1149865] Inference worker 0-0 is ready! [2024-09-30 00:26:19,751][1149865] All inference workers are ready! Signal rollout workers to start! [2024-09-30 00:26:19,776][1150144] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,776][1150141] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,777][1150142] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,777][1150138] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,777][1150145] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,781][1150140] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,785][1150137] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:19,791][1150143] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:26:20,015][1150141] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,019][1150142] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,020][1150145] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,020][1150138] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,021][1150140] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,028][1150137] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,226][1150141] Decorrelating experience for 32 frames... [2024-09-30 00:26:20,233][1150142] Decorrelating experience for 32 frames... [2024-09-30 00:26:20,233][1150145] Decorrelating experience for 32 frames... [2024-09-30 00:26:20,239][1150137] Decorrelating experience for 32 frames... [2024-09-30 00:26:20,271][1150143] Decorrelating experience for 0 frames... [2024-09-30 00:26:20,481][1150143] Decorrelating experience for 32 frames... [2024-09-30 00:26:20,496][1150145] Decorrelating experience for 64 frames... [2024-09-30 00:26:20,508][1150142] Decorrelating experience for 64 frames... [2024-09-30 00:26:20,739][1150141] Decorrelating experience for 64 frames... [2024-09-30 00:26:20,745][1150142] Decorrelating experience for 96 frames... [2024-09-30 00:26:20,759][1150137] Decorrelating experience for 64 frames... [2024-09-30 00:26:20,987][1150141] Decorrelating experience for 96 frames... [2024-09-30 00:26:20,991][1150137] Decorrelating experience for 96 frames... [2024-09-30 00:26:20,993][1150143] Decorrelating experience for 64 frames... [2024-09-30 00:26:21,227][1150143] Decorrelating experience for 96 frames... [2024-09-30 00:26:21,234][1150145] Decorrelating experience for 96 frames... [2024-09-30 00:26:21,489][1150138] Decorrelating experience for 32 frames... [2024-09-30 00:26:21,652][1150061] Signal inference workers to stop experience collection... [2024-09-30 00:26:21,655][1150139] InferenceWorker_p0-w0: stopping experience collection [2024-09-30 00:26:21,743][1150140] Decorrelating experience for 32 frames... [2024-09-30 00:26:21,758][1150138] Decorrelating experience for 64 frames... [2024-09-30 00:26:21,995][1150138] Decorrelating experience for 96 frames... [2024-09-30 00:26:22,002][1150140] Decorrelating experience for 64 frames... [2024-09-30 00:26:22,237][1150140] Decorrelating experience for 96 frames... [2024-09-30 00:26:22,624][1150061] Signal inference workers to resume experience collection... [2024-09-30 00:26:22,624][1150139] InferenceWorker_p0-w0: resuming experience collection [2024-09-30 00:26:23,854][1150139] Updated weights for policy 0, policy_version 10 (0.0128) [2024-09-30 00:26:24,557][1149865] Fps is (10 sec: 12288.1, 60 sec: 12288.1, 300 sec: 12288.1). Total num frames: 61440. Throughput: 0: 484.0. Samples: 2420. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-30 00:26:24,557][1149865] Avg episode reward: [(0, '4.453')] [2024-09-30 00:26:25,059][1150139] Updated weights for policy 0, policy_version 20 (0.0006) [2024-09-30 00:26:26,154][1150139] Updated weights for policy 0, policy_version 30 (0.0006) [2024-09-30 00:26:27,304][1150139] Updated weights for policy 0, policy_version 40 (0.0006) [2024-09-30 00:26:28,424][1150139] Updated weights for policy 0, policy_version 50 (0.0006) [2024-09-30 00:26:29,557][1149865] Fps is (10 sec: 24166.4, 60 sec: 24166.4, 300 sec: 24166.4). Total num frames: 241664. Throughput: 0: 5481.2. Samples: 54812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 00:26:29,557][1149865] Avg episode reward: [(0, '4.420')] [2024-09-30 00:26:29,562][1150061] Saving new best policy, reward=4.420! [2024-09-30 00:26:29,562][1150139] Updated weights for policy 0, policy_version 60 (0.0005) [2024-09-30 00:26:30,697][1150139] Updated weights for policy 0, policy_version 70 (0.0005) [2024-09-30 00:26:31,899][1150139] Updated weights for policy 0, policy_version 80 (0.0005) [2024-09-30 00:26:33,041][1150139] Updated weights for policy 0, policy_version 90 (0.0005) [2024-09-30 00:26:34,164][1150139] Updated weights for policy 0, policy_version 100 (0.0006) [2024-09-30 00:26:34,557][1149865] Fps is (10 sec: 36044.6, 60 sec: 28125.8, 300 sec: 28125.8). Total num frames: 421888. Throughput: 0: 5422.5. Samples: 81338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-30 00:26:34,557][1149865] Avg episode reward: [(0, '4.360')] [2024-09-30 00:26:35,313][1150139] Updated weights for policy 0, policy_version 110 (0.0006) [2024-09-30 00:26:36,243][1149865] Heartbeat connected on Batcher_0 [2024-09-30 00:26:36,247][1149865] Heartbeat connected on LearnerWorker_p0 [2024-09-30 00:26:36,254][1149865] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-30 00:26:36,261][1149865] Heartbeat connected on RolloutWorker_w1 [2024-09-30 00:26:36,265][1149865] Heartbeat connected on RolloutWorker_w2 [2024-09-30 00:26:36,270][1149865] Heartbeat connected on RolloutWorker_w3 [2024-09-30 00:26:36,273][1149865] Heartbeat connected on RolloutWorker_w4 [2024-09-30 00:26:36,278][1149865] Heartbeat connected on RolloutWorker_w5 [2024-09-30 00:26:36,283][1149865] Heartbeat connected on RolloutWorker_w6 [2024-09-30 00:26:36,286][1149865] Heartbeat connected on RolloutWorker_w7 [2024-09-30 00:26:36,388][1150139] Updated weights for policy 0, policy_version 120 (0.0006) [2024-09-30 00:26:37,485][1150139] Updated weights for policy 0, policy_version 130 (0.0005) [2024-09-30 00:26:38,623][1150139] Updated weights for policy 0, policy_version 140 (0.0005) [2024-09-30 00:26:39,557][1149865] Fps is (10 sec: 36454.4, 60 sec: 30310.4, 300 sec: 30310.4). Total num frames: 606208. Throughput: 0: 6816.1. Samples: 136322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 00:26:39,557][1149865] Avg episode reward: [(0, '4.362')] [2024-09-30 00:26:39,793][1150139] Updated weights for policy 0, policy_version 150 (0.0005) [2024-09-30 00:26:40,942][1150139] Updated weights for policy 0, policy_version 160 (0.0006) [2024-09-30 00:26:42,120][1150139] Updated weights for policy 0, policy_version 170 (0.0006) [2024-09-30 00:26:43,255][1150139] Updated weights for policy 0, policy_version 180 (0.0006) [2024-09-30 00:26:44,363][1150139] Updated weights for policy 0, policy_version 190 (0.0006) [2024-09-30 00:26:44,557][1149865] Fps is (10 sec: 36044.9, 60 sec: 31293.4, 300 sec: 31293.4). Total num frames: 782336. Throughput: 0: 7583.6. Samples: 189590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 00:26:44,557][1149865] Avg episode reward: [(0, '4.718')] [2024-09-30 00:26:44,560][1150061] Saving new best policy, reward=4.718! [2024-09-30 00:26:45,498][1150139] Updated weights for policy 0, policy_version 200 (0.0006) [2024-09-30 00:26:46,658][1150139] Updated weights for policy 0, policy_version 210 (0.0005) [2024-09-30 00:26:47,749][1150139] Updated weights for policy 0, policy_version 220 (0.0005) [2024-09-30 00:26:48,879][1150139] Updated weights for policy 0, policy_version 230 (0.0006) [2024-09-30 00:26:49,557][1149865] Fps is (10 sec: 36044.5, 60 sec: 32221.8, 300 sec: 32221.8). Total num frames: 966656. Throughput: 0: 7214.5. Samples: 216436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 00:26:49,558][1149865] Avg episode reward: [(0, '4.984')] [2024-09-30 00:26:49,558][1150061] Saving new best policy, reward=4.984! [2024-09-30 00:26:49,996][1150139] Updated weights for policy 0, policy_version 240 (0.0006) [2024-09-30 00:26:51,078][1150139] Updated weights for policy 0, policy_version 250 (0.0005) [2024-09-30 00:26:52,165][1150139] Updated weights for policy 0, policy_version 260 (0.0005) [2024-09-30 00:26:53,360][1150139] Updated weights for policy 0, policy_version 270 (0.0006) [2024-09-30 00:26:54,557][1149865] Fps is (10 sec: 36044.3, 60 sec: 32650.8, 300 sec: 32650.8). Total num frames: 1142784. Throughput: 0: 7775.5. Samples: 272144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 00:26:54,558][1149865] Avg episode reward: [(0, '6.825')] [2024-09-30 00:26:54,561][1150061] Saving new best policy, reward=6.825! [2024-09-30 00:26:54,627][1150139] Updated weights for policy 0, policy_version 280 (0.0006) [2024-09-30 00:26:55,804][1150139] Updated weights for policy 0, policy_version 290 (0.0006) [2024-09-30 00:26:56,921][1150139] Updated weights for policy 0, policy_version 300 (0.0006) [2024-09-30 00:26:58,061][1150139] Updated weights for policy 0, policy_version 310 (0.0006) [2024-09-30 00:26:59,306][1150139] Updated weights for policy 0, policy_version 320 (0.0006) [2024-09-30 00:26:59,557][1149865] Fps is (10 sec: 35225.8, 60 sec: 32972.8, 300 sec: 32972.8). Total num frames: 1318912. Throughput: 0: 8094.2. Samples: 323770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 00:26:59,557][1149865] Avg episode reward: [(0, '7.969')] [2024-09-30 00:26:59,558][1150061] Saving new best policy, reward=7.969! [2024-09-30 00:27:00,379][1150139] Updated weights for policy 0, policy_version 330 (0.0006) [2024-09-30 00:27:01,493][1150139] Updated weights for policy 0, policy_version 340 (0.0006) [2024-09-30 00:27:02,648][1150139] Updated weights for policy 0, policy_version 350 (0.0005) [2024-09-30 00:27:03,846][1150139] Updated weights for policy 0, policy_version 360 (0.0006) [2024-09-30 00:27:04,557][1149865] Fps is (10 sec: 35635.7, 60 sec: 33314.1, 300 sec: 33314.1). Total num frames: 1499136. Throughput: 0: 7803.8. Samples: 351172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 00:27:04,557][1149865] Avg episode reward: [(0, '9.395')] [2024-09-30 00:27:04,560][1150061] Saving new best policy, reward=9.395! [2024-09-30 00:27:04,958][1150139] Updated weights for policy 0, policy_version 370 (0.0005) [2024-09-30 00:27:06,158][1150139] Updated weights for policy 0, policy_version 380 (0.0006) [2024-09-30 00:27:07,341][1150139] Updated weights for policy 0, policy_version 390 (0.0006) [2024-09-30 00:27:08,471][1150139] Updated weights for policy 0, policy_version 400 (0.0006) [2024-09-30 00:27:09,550][1150139] Updated weights for policy 0, policy_version 410 (0.0005) [2024-09-30 00:27:09,557][1149865] Fps is (10 sec: 36045.1, 60 sec: 33587.2, 300 sec: 33587.2). Total num frames: 1679360. Throughput: 0: 8916.3. Samples: 403652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 00:27:09,557][1149865] Avg episode reward: [(0, '10.451')] [2024-09-30 00:27:09,558][1150061] Saving new best policy, reward=10.451! [2024-09-30 00:27:10,625][1150139] Updated weights for policy 0, policy_version 420 (0.0005) [2024-09-30 00:27:11,753][1150139] Updated weights for policy 0, policy_version 430 (0.0006) [2024-09-30 00:27:12,906][1150139] Updated weights for policy 0, policy_version 440 (0.0005) [2024-09-30 00:27:14,043][1150139] Updated weights for policy 0, policy_version 450 (0.0005) [2024-09-30 00:27:14,557][1149865] Fps is (10 sec: 36044.8, 60 sec: 33810.6, 300 sec: 33810.6). Total num frames: 1859584. Throughput: 0: 8974.4. Samples: 458662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 00:27:14,557][1149865] Avg episode reward: [(0, '13.438')] [2024-09-30 00:27:14,560][1150061] Saving new best policy, reward=13.438! [2024-09-30 00:27:15,159][1150139] Updated weights for policy 0, policy_version 460 (0.0006) [2024-09-30 00:27:16,224][1150139] Updated weights for policy 0, policy_version 470 (0.0006) [2024-09-30 00:27:17,339][1150139] Updated weights for policy 0, policy_version 480 (0.0006) [2024-09-30 00:27:18,411][1150139] Updated weights for policy 0, policy_version 490 (0.0006) [2024-09-30 00:27:19,490][1150139] Updated weights for policy 0, policy_version 500 (0.0006) [2024-09-30 00:27:19,557][1149865] Fps is (10 sec: 36863.4, 60 sec: 34133.3, 300 sec: 34133.3). Total num frames: 2048000. Throughput: 0: 9008.8. Samples: 486736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 00:27:19,558][1149865] Avg episode reward: [(0, '15.719')] [2024-09-30 00:27:19,558][1150061] Saving new best policy, reward=15.719! [2024-09-30 00:27:20,560][1150139] Updated weights for policy 0, policy_version 510 (0.0006) [2024-09-30 00:27:21,675][1150139] Updated weights for policy 0, policy_version 520 (0.0006) [2024-09-30 00:27:22,733][1150139] Updated weights for policy 0, policy_version 530 (0.0006) [2024-09-30 00:27:23,821][1150139] Updated weights for policy 0, policy_version 540 (0.0006) [2024-09-30 00:27:24,557][1149865] Fps is (10 sec: 37683.2, 60 sec: 36249.6, 300 sec: 34406.4). Total num frames: 2236416. Throughput: 0: 9047.6. Samples: 543462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-30 00:27:24,557][1149865] Avg episode reward: [(0, '18.072')] [2024-09-30 00:27:24,569][1150061] Saving new best policy, reward=18.072! [2024-09-30 00:27:24,893][1150139] Updated weights for policy 0, policy_version 550 (0.0006) [2024-09-30 00:27:25,971][1150139] Updated weights for policy 0, policy_version 560 (0.0005) [2024-09-30 00:27:27,037][1150139] Updated weights for policy 0, policy_version 570 (0.0006) [2024-09-30 00:27:28,155][1150139] Updated weights for policy 0, policy_version 580 (0.0005) [2024-09-30 00:27:29,272][1150139] Updated weights for policy 0, policy_version 590 (0.0006) [2024-09-30 00:27:29,557][1149865] Fps is (10 sec: 37683.7, 60 sec: 36386.1, 300 sec: 34640.5). Total num frames: 2424832. Throughput: 0: 9118.1. Samples: 599906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 00:27:29,557][1149865] Avg episode reward: [(0, '20.999')] [2024-09-30 00:27:29,558][1150061] Saving new best policy, reward=20.999! [2024-09-30 00:27:30,323][1150139] Updated weights for policy 0, policy_version 600 (0.0006) [2024-09-30 00:27:31,407][1150139] Updated weights for policy 0, policy_version 610 (0.0006) [2024-09-30 00:27:32,467][1150139] Updated weights for policy 0, policy_version 620 (0.0006) [2024-09-30 00:27:33,593][1150139] Updated weights for policy 0, policy_version 630 (0.0006) [2024-09-30 00:27:34,557][1149865] Fps is (10 sec: 37683.3, 60 sec: 36522.7, 300 sec: 34843.3). Total num frames: 2613248. Throughput: 0: 9160.0. Samples: 628636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 00:27:34,557][1149865] Avg episode reward: [(0, '20.389')] [2024-09-30 00:27:34,683][1150139] Updated weights for policy 0, policy_version 640 (0.0005) [2024-09-30 00:27:35,755][1150139] Updated weights for policy 0, policy_version 650 (0.0006) [2024-09-30 00:27:36,830][1150139] Updated weights for policy 0, policy_version 660 (0.0006) [2024-09-30 00:27:37,890][1150139] Updated weights for policy 0, policy_version 670 (0.0006) [2024-09-30 00:27:38,971][1150139] Updated weights for policy 0, policy_version 680 (0.0006) [2024-09-30 00:27:39,557][1149865] Fps is (10 sec: 38092.7, 60 sec: 36659.2, 300 sec: 35072.0). Total num frames: 2805760. Throughput: 0: 9181.5. Samples: 685308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 00:27:39,557][1149865] Avg episode reward: [(0, '19.388')] [2024-09-30 00:27:40,031][1150139] Updated weights for policy 0, policy_version 690 (0.0006) [2024-09-30 00:27:41,100][1150139] Updated weights for policy 0, policy_version 700 (0.0005) [2024-09-30 00:27:42,153][1150139] Updated weights for policy 0, policy_version 710 (0.0006) [2024-09-30 00:27:43,230][1150139] Updated weights for policy 0, policy_version 720 (0.0006) [2024-09-30 00:27:44,312][1150139] Updated weights for policy 0, policy_version 730 (0.0006) [2024-09-30 00:27:44,557][1149865] Fps is (10 sec: 38502.4, 60 sec: 36932.3, 300 sec: 35273.8). Total num frames: 2998272. Throughput: 0: 9311.9. Samples: 742806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 00:27:44,557][1149865] Avg episode reward: [(0, '22.197')] [2024-09-30 00:27:44,560][1150061] Saving new best policy, reward=22.197! [2024-09-30 00:27:45,376][1150139] Updated weights for policy 0, policy_version 740 (0.0006) [2024-09-30 00:27:46,438][1150139] Updated weights for policy 0, policy_version 750 (0.0006) [2024-09-30 00:27:47,515][1150139] Updated weights for policy 0, policy_version 760 (0.0005) [2024-09-30 00:27:48,572][1150139] Updated weights for policy 0, policy_version 770 (0.0006) [2024-09-30 00:27:49,557][1149865] Fps is (10 sec: 38502.4, 60 sec: 37068.9, 300 sec: 35453.2). Total num frames: 3190784. Throughput: 0: 9343.1. Samples: 771610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-30 00:27:49,557][1149865] Avg episode reward: [(0, '21.389')] [2024-09-30 00:27:49,645][1150139] Updated weights for policy 0, policy_version 780 (0.0006) [2024-09-30 00:27:50,716][1150139] Updated weights for policy 0, policy_version 790 (0.0005) [2024-09-30 00:27:51,800][1150139] Updated weights for policy 0, policy_version 800 (0.0006) [2024-09-30 00:27:52,845][1150139] Updated weights for policy 0, policy_version 810 (0.0006) [2024-09-30 00:27:53,913][1150139] Updated weights for policy 0, policy_version 820 (0.0005) [2024-09-30 00:27:54,557][1149865] Fps is (10 sec: 38092.7, 60 sec: 37273.7, 300 sec: 35570.5). Total num frames: 3379200. Throughput: 0: 9454.0. Samples: 829084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 00:27:54,557][1149865] Avg episode reward: [(0, '25.173')] [2024-09-30 00:27:54,560][1150061] Saving new best policy, reward=25.173! [2024-09-30 00:27:54,982][1150139] Updated weights for policy 0, policy_version 830 (0.0006) [2024-09-30 00:27:56,101][1150139] Updated weights for policy 0, policy_version 840 (0.0006) [2024-09-30 00:27:57,180][1150139] Updated weights for policy 0, policy_version 850 (0.0005) [2024-09-30 00:27:58,258][1150139] Updated weights for policy 0, policy_version 860 (0.0006) [2024-09-30 00:27:59,351][1150139] Updated weights for policy 0, policy_version 870 (0.0005) [2024-09-30 00:27:59,557][1149865] Fps is (10 sec: 38092.9, 60 sec: 37546.7, 300 sec: 35717.1). Total num frames: 3571712. Throughput: 0: 9494.8. Samples: 885926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 00:27:59,557][1149865] Avg episode reward: [(0, '21.345')] [2024-09-30 00:28:00,433][1150139] Updated weights for policy 0, policy_version 880 (0.0006) [2024-09-30 00:28:01,499][1150139] Updated weights for policy 0, policy_version 890 (0.0006) [2024-09-30 00:28:02,611][1150139] Updated weights for policy 0, policy_version 900 (0.0006) [2024-09-30 00:28:03,681][1150139] Updated weights for policy 0, policy_version 910 (0.0006) [2024-09-30 00:28:04,557][1149865] Fps is (10 sec: 38092.8, 60 sec: 37683.2, 300 sec: 35810.7). Total num frames: 3760128. Throughput: 0: 9499.9. Samples: 914232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 00:28:04,557][1149865] Avg episode reward: [(0, '25.023')] [2024-09-30 00:28:04,751][1150139] Updated weights for policy 0, policy_version 920 (0.0006) [2024-09-30 00:28:05,834][1150139] Updated weights for policy 0, policy_version 930 (0.0006) [2024-09-30 00:28:07,053][1150139] Updated weights for policy 0, policy_version 940 (0.0006) [2024-09-30 00:28:08,307][1150139] Updated weights for policy 0, policy_version 950 (0.0006) [2024-09-30 00:28:09,418][1150139] Updated weights for policy 0, policy_version 960 (0.0006) [2024-09-30 00:28:09,557][1149865] Fps is (10 sec: 36453.7, 60 sec: 37614.8, 300 sec: 35784.1). Total num frames: 3936256. Throughput: 0: 9452.1. Samples: 968808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 00:28:09,558][1149865] Avg episode reward: [(0, '26.893')] [2024-09-30 00:28:09,558][1150061] Saving new best policy, reward=26.893! [2024-09-30 00:28:10,576][1150139] Updated weights for policy 0, policy_version 970 (0.0006) [2024-09-30 00:28:11,497][1149865] Component Batcher_0 stopped! [2024-09-30 00:28:11,497][1150061] Stopping Batcher_0... [2024-09-30 00:28:11,497][1149865] Component RolloutWorker_w0 process died already! Don't wait for it. [2024-09-30 00:28:11,497][1150061] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 00:28:11,498][1150061] Loop batcher_evt_loop terminating... [2024-09-30 00:28:11,513][1150139] Weights refcount: 2 0 [2024-09-30 00:28:11,514][1150139] Stopping InferenceWorker_p0-w0... [2024-09-30 00:28:11,514][1150139] Loop inference_proc0-0_evt_loop terminating... [2024-09-30 00:28:11,514][1149865] Component InferenceWorker_p0-w0 stopped! [2024-09-30 00:28:11,527][1150138] Stopping RolloutWorker_w2... [2024-09-30 00:28:11,527][1149865] Component RolloutWorker_w2 stopped! [2024-09-30 00:28:11,528][1150138] Loop rollout_proc2_evt_loop terminating... [2024-09-30 00:28:11,530][1150142] Stopping RolloutWorker_w3... [2024-09-30 00:28:11,530][1149865] Component RolloutWorker_w3 stopped! [2024-09-30 00:28:11,530][1150142] Loop rollout_proc3_evt_loop terminating... [2024-09-30 00:28:11,531][1149865] Component RolloutWorker_w5 stopped! [2024-09-30 00:28:11,531][1150137] Stopping RolloutWorker_w5... [2024-09-30 00:28:11,531][1149865] Component RolloutWorker_w6 stopped! [2024-09-30 00:28:11,531][1150145] Stopping RolloutWorker_w6... [2024-09-30 00:28:11,531][1150137] Loop rollout_proc5_evt_loop terminating... [2024-09-30 00:28:11,531][1150145] Loop rollout_proc6_evt_loop terminating... [2024-09-30 00:28:11,532][1149865] Component RolloutWorker_w1 stopped! [2024-09-30 00:28:11,532][1150143] Stopping RolloutWorker_w1... [2024-09-30 00:28:11,533][1150143] Loop rollout_proc1_evt_loop terminating... [2024-09-30 00:28:11,533][1149865] Component RolloutWorker_w4 stopped! [2024-09-30 00:28:11,533][1150141] Stopping RolloutWorker_w4... [2024-09-30 00:28:11,533][1150141] Loop rollout_proc4_evt_loop terminating... [2024-09-30 00:28:11,536][1149865] Component RolloutWorker_w7 stopped! [2024-09-30 00:28:11,536][1150140] Stopping RolloutWorker_w7... [2024-09-30 00:28:11,536][1150140] Loop rollout_proc7_evt_loop terminating... [2024-09-30 00:28:11,548][1150061] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 00:28:11,677][1150061] Stopping LearnerWorker_p0... [2024-09-30 00:28:11,677][1150061] Loop learner_proc0_evt_loop terminating... [2024-09-30 00:28:11,677][1149865] Component LearnerWorker_p0 stopped! [2024-09-30 00:28:11,678][1149865] Waiting for process learner_proc0 to stop... [2024-09-30 00:28:12,213][1149865] Waiting for process inference_proc0-0 to join... [2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc0 to join... [2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc1 to join... [2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc2 to join... [2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc3 to join... [2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc4 to join... [2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc5 to join... [2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc6 to join... [2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc7 to join... [2024-09-30 00:28:12,215][1149865] Batcher 0 profile tree view: batching: 8.1702, releasing_batches: 0.0148 [2024-09-30 00:28:12,215][1149865] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 2.2430 update_model: 1.6718 weight_update: 0.0006 one_step: 0.0013 handle_policy_step: 101.6105 deserialize: 4.2527, stack: 0.5251, obs_to_device_normalize: 21.3149, forward: 52.1177, send_messages: 6.7725 prepare_outputs: 11.8901 to_cpu: 6.4354 [2024-09-30 00:28:12,216][1149865] Learner 0 profile tree view: misc: 0.0031, prepare_batch: 4.0428 train: 10.3860 epoch_init: 0.0033, minibatch_init: 0.0037, losses_postprocess: 0.1662, kl_divergence: 0.2113, after_optimizer: 0.8304 calculate_losses: 4.6270 losses_init: 0.0020, forward_head: 0.3762, bptt_initial: 2.3802, tail: 0.3318, advantages_returns: 0.0873, losses: 0.6229 bptt: 0.7209 bptt_forward_core: 0.6909 update: 4.3204 clip: 0.4495 [2024-09-30 00:28:12,216][1149865] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0819, enqueue_policy_requests: 4.5546, env_step: 67.0989, overhead: 3.2408, complete_rollouts: 0.1226 save_policy_outputs: 5.6070 split_output_tensors: 1.8787 [2024-09-30 00:28:12,216][1149865] Loop Runner_EvtLoop terminating... [2024-09-30 00:28:12,216][1149865] Runner profile tree view: main_loop: 115.9303 [2024-09-30 00:28:12,216][1149865] Collected {0: 4005888}, FPS: 34554.3 [2024-09-30 00:28:12,419][1149865] Loading existing experiment configuration from /home/luyang/workspace/rl/train_dir/default_experiment/config.json [2024-09-30 00:28:12,419][1149865] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-30 00:28:12,420][1149865] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'hf_repository'='esperesa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-30 00:28:12,420][1149865] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-30 00:28:12,441][1149865] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 00:28:12,443][1149865] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 00:28:12,443][1149865] RunningMeanStd input shape: (1,) [2024-09-30 00:28:12,452][1149865] ConvEncoder: input_channels=3 [2024-09-30 00:28:12,522][1149865] Conv encoder output size: 512 [2024-09-30 00:28:12,522][1149865] Policy head output size: 512 [2024-09-30 00:28:12,681][1149865] Loading state from checkpoint /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 00:28:13,271][1149865] Num frames 100... [2024-09-30 00:28:13,350][1149865] Num frames 200... [2024-09-30 00:28:13,427][1149865] Num frames 300... [2024-09-30 00:28:13,503][1149865] Num frames 400... [2024-09-30 00:28:13,581][1149865] Num frames 500... [2024-09-30 00:28:13,659][1149865] Num frames 600... [2024-09-30 00:28:13,738][1149865] Num frames 700... [2024-09-30 00:28:13,815][1149865] Num frames 800... [2024-09-30 00:28:13,894][1149865] Num frames 900... [2024-09-30 00:28:13,973][1149865] Num frames 1000... [2024-09-30 00:28:14,052][1149865] Num frames 1100... [2024-09-30 00:28:14,132][1149865] Num frames 1200... [2024-09-30 00:28:14,208][1149865] Num frames 1300... [2024-09-30 00:28:14,285][1149865] Num frames 1400... [2024-09-30 00:28:14,364][1149865] Num frames 1500... [2024-09-30 00:28:14,445][1149865] Avg episode rewards: #0: 34.360, true rewards: #0: 15.360 [2024-09-30 00:28:14,445][1149865] Avg episode reward: 34.360, avg true_objective: 15.360 [2024-09-30 00:28:14,497][1149865] Num frames 1600... [2024-09-30 00:28:14,575][1149865] Num frames 1700... [2024-09-30 00:28:14,654][1149865] Num frames 1800... [2024-09-30 00:28:14,733][1149865] Num frames 1900... [2024-09-30 00:28:14,811][1149865] Num frames 2000... [2024-09-30 00:28:14,891][1149865] Num frames 2100... [2024-09-30 00:28:14,967][1149865] Num frames 2200... [2024-09-30 00:28:15,045][1149865] Num frames 2300... [2024-09-30 00:28:15,125][1149865] Num frames 2400... [2024-09-30 00:28:15,204][1149865] Num frames 2500... [2024-09-30 00:28:15,283][1149865] Num frames 2600... [2024-09-30 00:28:15,363][1149865] Num frames 2700... [2024-09-30 00:28:15,443][1149865] Num frames 2800... [2024-09-30 00:28:15,523][1149865] Num frames 2900... [2024-09-30 00:28:15,602][1149865] Num frames 3000... [2024-09-30 00:28:15,681][1149865] Num frames 3100... [2024-09-30 00:28:15,758][1149865] Num frames 3200... [2024-09-30 00:28:15,834][1149865] Num frames 3300... [2024-09-30 00:28:15,958][1149865] Avg episode rewards: #0: 40.959, true rewards: #0: 16.960 [2024-09-30 00:28:15,958][1149865] Avg episode reward: 40.959, avg true_objective: 16.960 [2024-09-30 00:28:15,966][1149865] Num frames 3400... [2024-09-30 00:28:16,049][1149865] Num frames 3500... [2024-09-30 00:28:16,127][1149865] Num frames 3600... [2024-09-30 00:28:16,205][1149865] Num frames 3700... [2024-09-30 00:28:16,284][1149865] Num frames 3800... [2024-09-30 00:28:16,363][1149865] Num frames 3900... [2024-09-30 00:28:16,443][1149865] Num frames 4000... [2024-09-30 00:28:16,521][1149865] Num frames 4100... [2024-09-30 00:28:16,598][1149865] Num frames 4200... [2024-09-30 00:28:16,675][1149865] Num frames 4300... [2024-09-30 00:28:16,754][1149865] Num frames 4400... [2024-09-30 00:28:16,832][1149865] Num frames 4500... [2024-09-30 00:28:16,919][1149865] Avg episode rewards: #0: 36.480, true rewards: #0: 15.147 [2024-09-30 00:28:16,919][1149865] Avg episode reward: 36.480, avg true_objective: 15.147 [2024-09-30 00:28:16,968][1149865] Num frames 4600... [2024-09-30 00:28:17,047][1149865] Num frames 4700... [2024-09-30 00:28:17,126][1149865] Num frames 4800... [2024-09-30 00:28:17,206][1149865] Num frames 4900... [2024-09-30 00:28:17,282][1149865] Num frames 5000... [2024-09-30 00:28:17,358][1149865] Num frames 5100... [2024-09-30 00:28:17,470][1149865] Avg episode rewards: #0: 31.192, true rewards: #0: 12.942 [2024-09-30 00:28:17,471][1149865] Avg episode reward: 31.192, avg true_objective: 12.942 [2024-09-30 00:28:17,490][1149865] Num frames 5200... [2024-09-30 00:28:17,569][1149865] Num frames 5300... [2024-09-30 00:28:17,649][1149865] Num frames 5400... [2024-09-30 00:28:17,729][1149865] Num frames 5500... [2024-09-30 00:28:17,807][1149865] Num frames 5600... [2024-09-30 00:28:17,887][1149865] Num frames 5700... [2024-09-30 00:28:17,966][1149865] Num frames 5800... [2024-09-30 00:28:18,043][1149865] Num frames 5900... [2024-09-30 00:28:18,120][1149865] Num frames 6000... [2024-09-30 00:28:18,196][1149865] Num frames 6100... [2024-09-30 00:28:18,274][1149865] Num frames 6200... [2024-09-30 00:28:18,352][1149865] Num frames 6300... [2024-09-30 00:28:18,477][1149865] Avg episode rewards: #0: 30.786, true rewards: #0: 12.786 [2024-09-30 00:28:18,477][1149865] Avg episode reward: 30.786, avg true_objective: 12.786 [2024-09-30 00:28:18,484][1149865] Num frames 6400... [2024-09-30 00:28:18,564][1149865] Num frames 6500... [2024-09-30 00:28:18,643][1149865] Num frames 6600... [2024-09-30 00:28:18,723][1149865] Num frames 6700... [2024-09-30 00:28:18,803][1149865] Num frames 6800... [2024-09-30 00:28:18,879][1149865] Num frames 6900... [2024-09-30 00:28:18,957][1149865] Num frames 7000... [2024-09-30 00:28:19,033][1149865] Num frames 7100... [2024-09-30 00:28:19,110][1149865] Num frames 7200... [2024-09-30 00:28:19,190][1149865] Num frames 7300... [2024-09-30 00:28:19,271][1149865] Num frames 7400... [2024-09-30 00:28:19,353][1149865] Num frames 7500... [2024-09-30 00:28:19,443][1149865] Num frames 7600... [2024-09-30 00:28:19,538][1149865] Num frames 7700... [2024-09-30 00:28:19,630][1149865] Num frames 7800... [2024-09-30 00:28:19,721][1149865] Num frames 7900... [2024-09-30 00:28:19,817][1149865] Num frames 8000... [2024-09-30 00:28:19,910][1149865] Num frames 8100... [2024-09-30 00:28:20,014][1149865] Avg episode rewards: #0: 33.588, true rewards: #0: 13.588 [2024-09-30 00:28:20,015][1149865] Avg episode reward: 33.588, avg true_objective: 13.588 [2024-09-30 00:28:20,062][1149865] Num frames 8200... [2024-09-30 00:28:20,156][1149865] Num frames 8300... [2024-09-30 00:28:20,246][1149865] Num frames 8400... [2024-09-30 00:28:20,341][1149865] Num frames 8500... [2024-09-30 00:28:20,433][1149865] Num frames 8600... [2024-09-30 00:28:20,523][1149865] Num frames 8700... [2024-09-30 00:28:20,616][1149865] Num frames 8800... [2024-09-30 00:28:20,707][1149865] Num frames 8900... [2024-09-30 00:28:20,800][1149865] Num frames 9000... [2024-09-30 00:28:20,892][1149865] Num frames 9100... [2024-09-30 00:28:20,986][1149865] Num frames 9200... [2024-09-30 00:28:21,079][1149865] Num frames 9300... [2024-09-30 00:28:21,172][1149865] Num frames 9400... [2024-09-30 00:28:21,264][1149865] Num frames 9500... [2024-09-30 00:28:21,356][1149865] Num frames 9600... [2024-09-30 00:28:21,451][1149865] Num frames 9700... [2024-09-30 00:28:21,526][1149865] Avg episode rewards: #0: 34.030, true rewards: #0: 13.887 [2024-09-30 00:28:21,526][1149865] Avg episode reward: 34.030, avg true_objective: 13.887 [2024-09-30 00:28:21,591][1149865] Num frames 9800... [2024-09-30 00:28:21,672][1149865] Num frames 9900... [2024-09-30 00:28:21,757][1149865] Num frames 10000... [2024-09-30 00:28:21,850][1149865] Num frames 10100... [2024-09-30 00:28:21,945][1149865] Num frames 10200... [2024-09-30 00:28:22,035][1149865] Num frames 10300... [2024-09-30 00:28:22,128][1149865] Num frames 10400... [2024-09-30 00:28:22,220][1149865] Num frames 10500... [2024-09-30 00:28:22,312][1149865] Num frames 10600... [2024-09-30 00:28:22,393][1149865] Num frames 10700... [2024-09-30 00:28:22,474][1149865] Num frames 10800... [2024-09-30 00:28:22,537][1149865] Avg episode rewards: #0: 32.886, true rewards: #0: 13.511 [2024-09-30 00:28:22,537][1149865] Avg episode reward: 32.886, avg true_objective: 13.511 [2024-09-30 00:28:22,621][1149865] Num frames 10900... [2024-09-30 00:28:22,714][1149865] Num frames 11000... [2024-09-30 00:28:22,806][1149865] Num frames 11100... [2024-09-30 00:28:22,898][1149865] Num frames 11200... [2024-09-30 00:28:22,990][1149865] Num frames 11300... [2024-09-30 00:28:23,082][1149865] Num frames 11400... [2024-09-30 00:28:23,165][1149865] Num frames 11500... [2024-09-30 00:28:23,247][1149865] Num frames 11600... [2024-09-30 00:28:23,338][1149865] Num frames 11700... [2024-09-30 00:28:23,432][1149865] Num frames 11800... [2024-09-30 00:28:23,522][1149865] Num frames 11900... [2024-09-30 00:28:23,616][1149865] Num frames 12000... [2024-09-30 00:28:23,730][1149865] Num frames 12100... [2024-09-30 00:28:23,823][1149865] Num frames 12200... [2024-09-30 00:28:23,904][1149865] Num frames 12300... [2024-09-30 00:28:23,983][1149865] Num frames 12400... [2024-09-30 00:28:24,063][1149865] Num frames 12500... [2024-09-30 00:28:24,150][1149865] Num frames 12600... [2024-09-30 00:28:24,230][1149865] Num frames 12700... [2024-09-30 00:28:24,306][1149865] Num frames 12800... [2024-09-30 00:28:24,390][1149865] Avg episode rewards: #0: 35.268, true rewards: #0: 14.268 [2024-09-30 00:28:24,391][1149865] Avg episode reward: 35.268, avg true_objective: 14.268 [2024-09-30 00:28:24,438][1149865] Num frames 12900... [2024-09-30 00:28:24,516][1149865] Num frames 13000... [2024-09-30 00:28:24,594][1149865] Num frames 13100... [2024-09-30 00:28:24,675][1149865] Num frames 13200... [2024-09-30 00:28:24,755][1149865] Num frames 13300... [2024-09-30 00:28:24,833][1149865] Num frames 13400... [2024-09-30 00:28:24,946][1149865] Avg episode rewards: #0: 33.076, true rewards: #0: 13.476 [2024-09-30 00:28:24,946][1149865] Avg episode reward: 33.076, avg true_objective: 13.476 [2024-09-30 00:28:42,313][1149865] Replay video saved to /home/luyang/workspace/rl/train_dir/default_experiment/replay.mp4!