Upload folder using huggingface_hub

c6792da verified about 1 month ago

48.1 kB

	[2024-09-30 00:25:00,956][1148693] Saving configuration to /home/luyang/workspace/rl/train_dir/default_experiment/config.json...
	[2024-09-30 00:25:00,961][1148693] Rollout worker 0 uses device cpu
	[2024-09-30 00:25:00,961][1148693] Rollout worker 1 uses device cpu
	[2024-09-30 00:25:00,961][1148693] Rollout worker 2 uses device cpu
	[2024-09-30 00:25:00,961][1148693] Rollout worker 3 uses device cpu
	[2024-09-30 00:25:00,961][1148693] Rollout worker 4 uses device cpu
	[2024-09-30 00:25:00,961][1148693] Rollout worker 5 uses device cpu
	[2024-09-30 00:25:00,961][1148693] Rollout worker 6 uses device cpu
	[2024-09-30 00:25:00,962][1148693] Rollout worker 7 uses device cpu
	[2024-09-30 00:25:01,008][1148693] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:25:01,008][1148693] InferenceWorker_p0-w0: min num requests: 2
	[2024-09-30 00:25:01,042][1148693] Starting all processes...
	[2024-09-30 00:25:01,042][1148693] Starting process learner_proc0
	[2024-09-30 00:25:02,676][1148693] Starting all processes...
	[2024-09-30 00:25:02,680][1148981] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:25:02,680][1148981] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2024-09-30 00:25:02,680][1148693] Starting process inference_proc0-0
	[2024-09-30 00:25:02,680][1148693] Starting process rollout_proc0
	[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc1
	[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc2
	[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc3
	[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc4
	[2024-09-30 00:25:02,681][1148693] Starting process rollout_proc5
	[2024-09-30 00:25:02,686][1148693] Starting process rollout_proc6
	[2024-09-30 00:25:02,686][1148693] Starting process rollout_proc7
	[2024-09-30 00:25:02,712][1148981] Num visible devices: 1
	[2024-09-30 00:25:02,719][1148981] Starting seed is not provided
	[2024-09-30 00:25:02,719][1148981] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:25:02,719][1148981] Initializing actor-critic model on device cuda:0
	[2024-09-30 00:25:02,719][1148981] RunningMeanStd input shape: (3, 72, 128)
	[2024-09-30 00:25:02,720][1148981] RunningMeanStd input shape: (1,)
	[2024-09-30 00:25:02,729][1148981] ConvEncoder: input_channels=3
	[2024-09-30 00:25:02,801][1148981] Conv encoder output size: 512
	[2024-09-30 00:25:02,801][1148981] Policy head output size: 512
	[2024-09-30 00:25:02,812][1148981] Created Actor Critic model with architecture:
	[2024-09-30 00:25:02,813][1148981] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2024-09-30 00:25:02,951][1148981] Using optimizer <class 'torch.optim.adam.Adam'>
	[2024-09-30 00:25:03,366][1148693] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 1148693], exiting...
	[2024-09-30 00:25:03,366][1148693] Runner profile tree view:
	main_loop: 2.3244
	[2024-09-30 00:25:03,367][1148693] Collected {}, FPS: 0.0
	[2024-09-30 00:25:03,367][1148981] Stopping Batcher_0...
	[2024-09-30 00:25:03,368][1148981] Loop batcher_evt_loop terminating...
	[2024-09-30 00:25:03,637][1148981] No checkpoints found
	[2024-09-30 00:25:03,637][1148981] Did not load from checkpoint, starting from scratch!
	[2024-09-30 00:25:03,637][1148981] Initialized policy 0 weights for model version 0
	[2024-09-30 00:25:03,639][1148981] LearnerWorker_p0 finished initialization!
	[2024-09-30 00:25:03,640][1148981] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
	[2024-09-30 00:25:03,662][1148981] Stopping LearnerWorker_p0...
	[2024-09-30 00:25:03,662][1148981] Loop learner_proc0_evt_loop terminating...
	[2024-09-30 00:26:16,204][1149865] Saving configuration to /home/luyang/workspace/rl/train_dir/default_experiment/config.json...
	[2024-09-30 00:26:16,209][1149865] Rollout worker 0 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 1 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 2 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 3 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 4 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 5 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 6 uses device cpu
	[2024-09-30 00:26:16,209][1149865] Rollout worker 7 uses device cpu
	[2024-09-30 00:26:16,252][1149865] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:26:16,252][1149865] InferenceWorker_p0-w0: min num requests: 2
	[2024-09-30 00:26:16,286][1149865] Starting all processes...
	[2024-09-30 00:26:16,286][1149865] Starting process learner_proc0
	[2024-09-30 00:26:17,897][1149865] Starting all processes...
	[2024-09-30 00:26:17,901][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:26:17,901][1150061] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2024-09-30 00:26:17,901][1149865] Starting process inference_proc0-0
	[2024-09-30 00:26:17,901][1149865] Starting process rollout_proc0
	[2024-09-30 00:26:17,901][1149865] Starting process rollout_proc1
	[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc2
	[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc3
	[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc4
	[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc5
	[2024-09-30 00:26:17,902][1149865] Starting process rollout_proc6
	[2024-09-30 00:26:17,903][1149865] Starting process rollout_proc7
	[2024-09-30 00:26:17,953][1150061] Num visible devices: 1
	[2024-09-30 00:26:17,959][1150061] Starting seed is not provided
	[2024-09-30 00:26:17,959][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:26:17,959][1150061] Initializing actor-critic model on device cuda:0
	[2024-09-30 00:26:17,959][1150061] RunningMeanStd input shape: (3, 72, 128)
	[2024-09-30 00:26:17,960][1150061] RunningMeanStd input shape: (1,)
	[2024-09-30 00:26:17,968][1150061] ConvEncoder: input_channels=3
	[2024-09-30 00:26:18,041][1150061] Conv encoder output size: 512
	[2024-09-30 00:26:18,041][1150061] Policy head output size: 512
	[2024-09-30 00:26:18,052][1150061] Created Actor Critic model with architecture:
	[2024-09-30 00:26:18,052][1150061] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2024-09-30 00:26:18,183][1150061] Using optimizer <class 'torch.optim.adam.Adam'>
	[2024-09-30 00:26:18,816][1150061] Loading state from checkpoint /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth...
	[2024-09-30 00:26:18,828][1150061] Loading model from checkpoint
	[2024-09-30 00:26:18,829][1150061] Loaded experiment state at self.train_step=0, self.env_steps=0
	[2024-09-30 00:26:18,829][1150061] Initialized policy 0 weights for model version 0
	[2024-09-30 00:26:18,831][1150061] LearnerWorker_p0 finished initialization!
	[2024-09-30 00:26:18,831][1150061] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:26:19,422][1150142] Worker 3 uses CPU cores [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
	[2024-09-30 00:26:19,449][1150140] Worker 7 uses CPU cores [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]
	[2024-09-30 00:26:19,451][1150144] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
	[2024-09-30 00:26:19,456][1150145] Worker 6 uses CPU cores [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83]
	[2024-09-30 00:26:19,456][1150137] Worker 5 uses CPU cores [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]
	[2024-09-30 00:26:19,462][1150141] Worker 4 uses CPU cores [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
	[2024-09-30 00:26:19,465][1150143] Worker 1 uses CPU cores [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
	[2024-09-30 00:26:19,466][1150138] Worker 2 uses CPU cores [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
	[2024-09-30 00:26:19,483][1150139] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2024-09-30 00:26:19,484][1150139] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2024-09-30 00:26:19,545][1150139] Num visible devices: 1
	[2024-09-30 00:26:19,557][1149865] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2024-09-30 00:26:19,639][1150139] RunningMeanStd input shape: (3, 72, 128)
	[2024-09-30 00:26:19,640][1150139] RunningMeanStd input shape: (1,)
	[2024-09-30 00:26:19,648][1150139] ConvEncoder: input_channels=3
	[2024-09-30 00:26:19,720][1150139] Conv encoder output size: 512
	[2024-09-30 00:26:19,720][1150139] Policy head output size: 512
	[2024-09-30 00:26:19,751][1149865] Inference worker 0-0 is ready!
	[2024-09-30 00:26:19,751][1149865] All inference workers are ready! Signal rollout workers to start!
	[2024-09-30 00:26:19,776][1150144] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,776][1150141] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,777][1150142] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,777][1150138] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,777][1150145] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,781][1150140] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,785][1150137] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:19,791][1150143] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:26:20,015][1150141] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,019][1150142] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,020][1150145] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,020][1150138] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,021][1150140] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,028][1150137] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,226][1150141] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:20,233][1150142] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:20,233][1150145] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:20,239][1150137] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:20,271][1150143] Decorrelating experience for 0 frames...
	[2024-09-30 00:26:20,481][1150143] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:20,496][1150145] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:20,508][1150142] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:20,739][1150141] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:20,745][1150142] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:20,759][1150137] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:20,987][1150141] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:20,991][1150137] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:20,993][1150143] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:21,227][1150143] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:21,234][1150145] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:21,489][1150138] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:21,652][1150061] Signal inference workers to stop experience collection...
	[2024-09-30 00:26:21,655][1150139] InferenceWorker_p0-w0: stopping experience collection
	[2024-09-30 00:26:21,743][1150140] Decorrelating experience for 32 frames...
	[2024-09-30 00:26:21,758][1150138] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:21,995][1150138] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:22,002][1150140] Decorrelating experience for 64 frames...
	[2024-09-30 00:26:22,237][1150140] Decorrelating experience for 96 frames...
	[2024-09-30 00:26:22,624][1150061] Signal inference workers to resume experience collection...
	[2024-09-30 00:26:22,624][1150139] InferenceWorker_p0-w0: resuming experience collection
	[2024-09-30 00:26:23,854][1150139] Updated weights for policy 0, policy_version 10 (0.0128)
	[2024-09-30 00:26:24,557][1149865] Fps is (10 sec: 12288.1, 60 sec: 12288.1, 300 sec: 12288.1). Total num frames: 61440. Throughput: 0: 484.0. Samples: 2420. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
	[2024-09-30 00:26:24,557][1149865] Avg episode reward: [(0, '4.453')]
	[2024-09-30 00:26:25,059][1150139] Updated weights for policy 0, policy_version 20 (0.0006)
	[2024-09-30 00:26:26,154][1150139] Updated weights for policy 0, policy_version 30 (0.0006)
	[2024-09-30 00:26:27,304][1150139] Updated weights for policy 0, policy_version 40 (0.0006)
	[2024-09-30 00:26:28,424][1150139] Updated weights for policy 0, policy_version 50 (0.0006)
	[2024-09-30 00:26:29,557][1149865] Fps is (10 sec: 24166.4, 60 sec: 24166.4, 300 sec: 24166.4). Total num frames: 241664. Throughput: 0: 5481.2. Samples: 54812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2024-09-30 00:26:29,557][1149865] Avg episode reward: [(0, '4.420')]
	[2024-09-30 00:26:29,562][1150061] Saving new best policy, reward=4.420!
	[2024-09-30 00:26:29,562][1150139] Updated weights for policy 0, policy_version 60 (0.0005)
	[2024-09-30 00:26:30,697][1150139] Updated weights for policy 0, policy_version 70 (0.0005)
	[2024-09-30 00:26:31,899][1150139] Updated weights for policy 0, policy_version 80 (0.0005)
	[2024-09-30 00:26:33,041][1150139] Updated weights for policy 0, policy_version 90 (0.0005)
	[2024-09-30 00:26:34,164][1150139] Updated weights for policy 0, policy_version 100 (0.0006)
	[2024-09-30 00:26:34,557][1149865] Fps is (10 sec: 36044.6, 60 sec: 28125.8, 300 sec: 28125.8). Total num frames: 421888. Throughput: 0: 5422.5. Samples: 81338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-09-30 00:26:34,557][1149865] Avg episode reward: [(0, '4.360')]
	[2024-09-30 00:26:35,313][1150139] Updated weights for policy 0, policy_version 110 (0.0006)
	[2024-09-30 00:26:36,243][1149865] Heartbeat connected on Batcher_0
	[2024-09-30 00:26:36,247][1149865] Heartbeat connected on LearnerWorker_p0
	[2024-09-30 00:26:36,254][1149865] Heartbeat connected on InferenceWorker_p0-w0
	[2024-09-30 00:26:36,261][1149865] Heartbeat connected on RolloutWorker_w1
	[2024-09-30 00:26:36,265][1149865] Heartbeat connected on RolloutWorker_w2
	[2024-09-30 00:26:36,270][1149865] Heartbeat connected on RolloutWorker_w3
	[2024-09-30 00:26:36,273][1149865] Heartbeat connected on RolloutWorker_w4
	[2024-09-30 00:26:36,278][1149865] Heartbeat connected on RolloutWorker_w5
	[2024-09-30 00:26:36,283][1149865] Heartbeat connected on RolloutWorker_w6
	[2024-09-30 00:26:36,286][1149865] Heartbeat connected on RolloutWorker_w7
	[2024-09-30 00:26:36,388][1150139] Updated weights for policy 0, policy_version 120 (0.0006)
	[2024-09-30 00:26:37,485][1150139] Updated weights for policy 0, policy_version 130 (0.0005)
	[2024-09-30 00:26:38,623][1150139] Updated weights for policy 0, policy_version 140 (0.0005)
	[2024-09-30 00:26:39,557][1149865] Fps is (10 sec: 36454.4, 60 sec: 30310.4, 300 sec: 30310.4). Total num frames: 606208. Throughput: 0: 6816.1. Samples: 136322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-09-30 00:26:39,557][1149865] Avg episode reward: [(0, '4.362')]
	[2024-09-30 00:26:39,793][1150139] Updated weights for policy 0, policy_version 150 (0.0005)
	[2024-09-30 00:26:40,942][1150139] Updated weights for policy 0, policy_version 160 (0.0006)
	[2024-09-30 00:26:42,120][1150139] Updated weights for policy 0, policy_version 170 (0.0006)
	[2024-09-30 00:26:43,255][1150139] Updated weights for policy 0, policy_version 180 (0.0006)
	[2024-09-30 00:26:44,363][1150139] Updated weights for policy 0, policy_version 190 (0.0006)
	[2024-09-30 00:26:44,557][1149865] Fps is (10 sec: 36044.9, 60 sec: 31293.4, 300 sec: 31293.4). Total num frames: 782336. Throughput: 0: 7583.6. Samples: 189590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2024-09-30 00:26:44,557][1149865] Avg episode reward: [(0, '4.718')]
	[2024-09-30 00:26:44,560][1150061] Saving new best policy, reward=4.718!
	[2024-09-30 00:26:45,498][1150139] Updated weights for policy 0, policy_version 200 (0.0006)
	[2024-09-30 00:26:46,658][1150139] Updated weights for policy 0, policy_version 210 (0.0005)
	[2024-09-30 00:26:47,749][1150139] Updated weights for policy 0, policy_version 220 (0.0005)
	[2024-09-30 00:26:48,879][1150139] Updated weights for policy 0, policy_version 230 (0.0006)
	[2024-09-30 00:26:49,557][1149865] Fps is (10 sec: 36044.5, 60 sec: 32221.8, 300 sec: 32221.8). Total num frames: 966656. Throughput: 0: 7214.5. Samples: 216436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-09-30 00:26:49,558][1149865] Avg episode reward: [(0, '4.984')]
	[2024-09-30 00:26:49,558][1150061] Saving new best policy, reward=4.984!
	[2024-09-30 00:26:49,996][1150139] Updated weights for policy 0, policy_version 240 (0.0006)
	[2024-09-30 00:26:51,078][1150139] Updated weights for policy 0, policy_version 250 (0.0005)
	[2024-09-30 00:26:52,165][1150139] Updated weights for policy 0, policy_version 260 (0.0005)
	[2024-09-30 00:26:53,360][1150139] Updated weights for policy 0, policy_version 270 (0.0006)
	[2024-09-30 00:26:54,557][1149865] Fps is (10 sec: 36044.3, 60 sec: 32650.8, 300 sec: 32650.8). Total num frames: 1142784. Throughput: 0: 7775.5. Samples: 272144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-09-30 00:26:54,558][1149865] Avg episode reward: [(0, '6.825')]
	[2024-09-30 00:26:54,561][1150061] Saving new best policy, reward=6.825!
	[2024-09-30 00:26:54,627][1150139] Updated weights for policy 0, policy_version 280 (0.0006)
	[2024-09-30 00:26:55,804][1150139] Updated weights for policy 0, policy_version 290 (0.0006)
	[2024-09-30 00:26:56,921][1150139] Updated weights for policy 0, policy_version 300 (0.0006)
	[2024-09-30 00:26:58,061][1150139] Updated weights for policy 0, policy_version 310 (0.0006)
	[2024-09-30 00:26:59,306][1150139] Updated weights for policy 0, policy_version 320 (0.0006)
	[2024-09-30 00:26:59,557][1149865] Fps is (10 sec: 35225.8, 60 sec: 32972.8, 300 sec: 32972.8). Total num frames: 1318912. Throughput: 0: 8094.2. Samples: 323770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-09-30 00:26:59,557][1149865] Avg episode reward: [(0, '7.969')]
	[2024-09-30 00:26:59,558][1150061] Saving new best policy, reward=7.969!
	[2024-09-30 00:27:00,379][1150139] Updated weights for policy 0, policy_version 330 (0.0006)
	[2024-09-30 00:27:01,493][1150139] Updated weights for policy 0, policy_version 340 (0.0006)
	[2024-09-30 00:27:02,648][1150139] Updated weights for policy 0, policy_version 350 (0.0005)
	[2024-09-30 00:27:03,846][1150139] Updated weights for policy 0, policy_version 360 (0.0006)
	[2024-09-30 00:27:04,557][1149865] Fps is (10 sec: 35635.7, 60 sec: 33314.1, 300 sec: 33314.1). Total num frames: 1499136. Throughput: 0: 7803.8. Samples: 351172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-09-30 00:27:04,557][1149865] Avg episode reward: [(0, '9.395')]
	[2024-09-30 00:27:04,560][1150061] Saving new best policy, reward=9.395!
	[2024-09-30 00:27:04,958][1150139] Updated weights for policy 0, policy_version 370 (0.0005)
	[2024-09-30 00:27:06,158][1150139] Updated weights for policy 0, policy_version 380 (0.0006)
	[2024-09-30 00:27:07,341][1150139] Updated weights for policy 0, policy_version 390 (0.0006)
	[2024-09-30 00:27:08,471][1150139] Updated weights for policy 0, policy_version 400 (0.0006)
	[2024-09-30 00:27:09,550][1150139] Updated weights for policy 0, policy_version 410 (0.0005)
	[2024-09-30 00:27:09,557][1149865] Fps is (10 sec: 36045.1, 60 sec: 33587.2, 300 sec: 33587.2). Total num frames: 1679360. Throughput: 0: 8916.3. Samples: 403652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-09-30 00:27:09,557][1149865] Avg episode reward: [(0, '10.451')]
	[2024-09-30 00:27:09,558][1150061] Saving new best policy, reward=10.451!
	[2024-09-30 00:27:10,625][1150139] Updated weights for policy 0, policy_version 420 (0.0005)
	[2024-09-30 00:27:11,753][1150139] Updated weights for policy 0, policy_version 430 (0.0006)
	[2024-09-30 00:27:12,906][1150139] Updated weights for policy 0, policy_version 440 (0.0005)
	[2024-09-30 00:27:14,043][1150139] Updated weights for policy 0, policy_version 450 (0.0005)
	[2024-09-30 00:27:14,557][1149865] Fps is (10 sec: 36044.8, 60 sec: 33810.6, 300 sec: 33810.6). Total num frames: 1859584. Throughput: 0: 8974.4. Samples: 458662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2024-09-30 00:27:14,557][1149865] Avg episode reward: [(0, '13.438')]
	[2024-09-30 00:27:14,560][1150061] Saving new best policy, reward=13.438!
	[2024-09-30 00:27:15,159][1150139] Updated weights for policy 0, policy_version 460 (0.0006)
	[2024-09-30 00:27:16,224][1150139] Updated weights for policy 0, policy_version 470 (0.0006)
	[2024-09-30 00:27:17,339][1150139] Updated weights for policy 0, policy_version 480 (0.0006)
	[2024-09-30 00:27:18,411][1150139] Updated weights for policy 0, policy_version 490 (0.0006)
	[2024-09-30 00:27:19,490][1150139] Updated weights for policy 0, policy_version 500 (0.0006)
	[2024-09-30 00:27:19,557][1149865] Fps is (10 sec: 36863.4, 60 sec: 34133.3, 300 sec: 34133.3). Total num frames: 2048000. Throughput: 0: 9008.8. Samples: 486736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2024-09-30 00:27:19,558][1149865] Avg episode reward: [(0, '15.719')]
	[2024-09-30 00:27:19,558][1150061] Saving new best policy, reward=15.719!
	[2024-09-30 00:27:20,560][1150139] Updated weights for policy 0, policy_version 510 (0.0006)
	[2024-09-30 00:27:21,675][1150139] Updated weights for policy 0, policy_version 520 (0.0006)
	[2024-09-30 00:27:22,733][1150139] Updated weights for policy 0, policy_version 530 (0.0006)
	[2024-09-30 00:27:23,821][1150139] Updated weights for policy 0, policy_version 540 (0.0006)
	[2024-09-30 00:27:24,557][1149865] Fps is (10 sec: 37683.2, 60 sec: 36249.6, 300 sec: 34406.4). Total num frames: 2236416. Throughput: 0: 9047.6. Samples: 543462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-09-30 00:27:24,557][1149865] Avg episode reward: [(0, '18.072')]
	[2024-09-30 00:27:24,569][1150061] Saving new best policy, reward=18.072!
	[2024-09-30 00:27:24,893][1150139] Updated weights for policy 0, policy_version 550 (0.0006)
	[2024-09-30 00:27:25,971][1150139] Updated weights for policy 0, policy_version 560 (0.0005)
	[2024-09-30 00:27:27,037][1150139] Updated weights for policy 0, policy_version 570 (0.0006)
	[2024-09-30 00:27:28,155][1150139] Updated weights for policy 0, policy_version 580 (0.0005)
	[2024-09-30 00:27:29,272][1150139] Updated weights for policy 0, policy_version 590 (0.0006)
	[2024-09-30 00:27:29,557][1149865] Fps is (10 sec: 37683.7, 60 sec: 36386.1, 300 sec: 34640.5). Total num frames: 2424832. Throughput: 0: 9118.1. Samples: 599906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-09-30 00:27:29,557][1149865] Avg episode reward: [(0, '20.999')]
	[2024-09-30 00:27:29,558][1150061] Saving new best policy, reward=20.999!
	[2024-09-30 00:27:30,323][1150139] Updated weights for policy 0, policy_version 600 (0.0006)
	[2024-09-30 00:27:31,407][1150139] Updated weights for policy 0, policy_version 610 (0.0006)
	[2024-09-30 00:27:32,467][1150139] Updated weights for policy 0, policy_version 620 (0.0006)
	[2024-09-30 00:27:33,593][1150139] Updated weights for policy 0, policy_version 630 (0.0006)
	[2024-09-30 00:27:34,557][1149865] Fps is (10 sec: 37683.3, 60 sec: 36522.7, 300 sec: 34843.3). Total num frames: 2613248. Throughput: 0: 9160.0. Samples: 628636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2024-09-30 00:27:34,557][1149865] Avg episode reward: [(0, '20.389')]
	[2024-09-30 00:27:34,683][1150139] Updated weights for policy 0, policy_version 640 (0.0005)
	[2024-09-30 00:27:35,755][1150139] Updated weights for policy 0, policy_version 650 (0.0006)
	[2024-09-30 00:27:36,830][1150139] Updated weights for policy 0, policy_version 660 (0.0006)
	[2024-09-30 00:27:37,890][1150139] Updated weights for policy 0, policy_version 670 (0.0006)
	[2024-09-30 00:27:38,971][1150139] Updated weights for policy 0, policy_version 680 (0.0006)
	[2024-09-30 00:27:39,557][1149865] Fps is (10 sec: 38092.7, 60 sec: 36659.2, 300 sec: 35072.0). Total num frames: 2805760. Throughput: 0: 9181.5. Samples: 685308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-09-30 00:27:39,557][1149865] Avg episode reward: [(0, '19.388')]
	[2024-09-30 00:27:40,031][1150139] Updated weights for policy 0, policy_version 690 (0.0006)
	[2024-09-30 00:27:41,100][1150139] Updated weights for policy 0, policy_version 700 (0.0005)
	[2024-09-30 00:27:42,153][1150139] Updated weights for policy 0, policy_version 710 (0.0006)
	[2024-09-30 00:27:43,230][1150139] Updated weights for policy 0, policy_version 720 (0.0006)
	[2024-09-30 00:27:44,312][1150139] Updated weights for policy 0, policy_version 730 (0.0006)
	[2024-09-30 00:27:44,557][1149865] Fps is (10 sec: 38502.4, 60 sec: 36932.3, 300 sec: 35273.8). Total num frames: 2998272. Throughput: 0: 9311.9. Samples: 742806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-09-30 00:27:44,557][1149865] Avg episode reward: [(0, '22.197')]
	[2024-09-30 00:27:44,560][1150061] Saving new best policy, reward=22.197!
	[2024-09-30 00:27:45,376][1150139] Updated weights for policy 0, policy_version 740 (0.0006)
	[2024-09-30 00:27:46,438][1150139] Updated weights for policy 0, policy_version 750 (0.0006)
	[2024-09-30 00:27:47,515][1150139] Updated weights for policy 0, policy_version 760 (0.0005)
	[2024-09-30 00:27:48,572][1150139] Updated weights for policy 0, policy_version 770 (0.0006)
	[2024-09-30 00:27:49,557][1149865] Fps is (10 sec: 38502.4, 60 sec: 37068.9, 300 sec: 35453.2). Total num frames: 3190784. Throughput: 0: 9343.1. Samples: 771610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2024-09-30 00:27:49,557][1149865] Avg episode reward: [(0, '21.389')]
	[2024-09-30 00:27:49,645][1150139] Updated weights for policy 0, policy_version 780 (0.0006)
	[2024-09-30 00:27:50,716][1150139] Updated weights for policy 0, policy_version 790 (0.0005)
	[2024-09-30 00:27:51,800][1150139] Updated weights for policy 0, policy_version 800 (0.0006)
	[2024-09-30 00:27:52,845][1150139] Updated weights for policy 0, policy_version 810 (0.0006)
	[2024-09-30 00:27:53,913][1150139] Updated weights for policy 0, policy_version 820 (0.0005)
	[2024-09-30 00:27:54,557][1149865] Fps is (10 sec: 38092.7, 60 sec: 37273.7, 300 sec: 35570.5). Total num frames: 3379200. Throughput: 0: 9454.0. Samples: 829084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2024-09-30 00:27:54,557][1149865] Avg episode reward: [(0, '25.173')]
	[2024-09-30 00:27:54,560][1150061] Saving new best policy, reward=25.173!
	[2024-09-30 00:27:54,982][1150139] Updated weights for policy 0, policy_version 830 (0.0006)
	[2024-09-30 00:27:56,101][1150139] Updated weights for policy 0, policy_version 840 (0.0006)
	[2024-09-30 00:27:57,180][1150139] Updated weights for policy 0, policy_version 850 (0.0005)
	[2024-09-30 00:27:58,258][1150139] Updated weights for policy 0, policy_version 860 (0.0006)
	[2024-09-30 00:27:59,351][1150139] Updated weights for policy 0, policy_version 870 (0.0005)
	[2024-09-30 00:27:59,557][1149865] Fps is (10 sec: 38092.9, 60 sec: 37546.7, 300 sec: 35717.1). Total num frames: 3571712. Throughput: 0: 9494.8. Samples: 885926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2024-09-30 00:27:59,557][1149865] Avg episode reward: [(0, '21.345')]
	[2024-09-30 00:28:00,433][1150139] Updated weights for policy 0, policy_version 880 (0.0006)
	[2024-09-30 00:28:01,499][1150139] Updated weights for policy 0, policy_version 890 (0.0006)
	[2024-09-30 00:28:02,611][1150139] Updated weights for policy 0, policy_version 900 (0.0006)
	[2024-09-30 00:28:03,681][1150139] Updated weights for policy 0, policy_version 910 (0.0006)
	[2024-09-30 00:28:04,557][1149865] Fps is (10 sec: 38092.8, 60 sec: 37683.2, 300 sec: 35810.7). Total num frames: 3760128. Throughput: 0: 9499.9. Samples: 914232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-09-30 00:28:04,557][1149865] Avg episode reward: [(0, '25.023')]
	[2024-09-30 00:28:04,751][1150139] Updated weights for policy 0, policy_version 920 (0.0006)
	[2024-09-30 00:28:05,834][1150139] Updated weights for policy 0, policy_version 930 (0.0006)
	[2024-09-30 00:28:07,053][1150139] Updated weights for policy 0, policy_version 940 (0.0006)
	[2024-09-30 00:28:08,307][1150139] Updated weights for policy 0, policy_version 950 (0.0006)
	[2024-09-30 00:28:09,418][1150139] Updated weights for policy 0, policy_version 960 (0.0006)
	[2024-09-30 00:28:09,557][1149865] Fps is (10 sec: 36453.7, 60 sec: 37614.8, 300 sec: 35784.1). Total num frames: 3936256. Throughput: 0: 9452.1. Samples: 968808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2024-09-30 00:28:09,558][1149865] Avg episode reward: [(0, '26.893')]
	[2024-09-30 00:28:09,558][1150061] Saving new best policy, reward=26.893!
	[2024-09-30 00:28:10,576][1150139] Updated weights for policy 0, policy_version 970 (0.0006)
	[2024-09-30 00:28:11,497][1149865] Component Batcher_0 stopped!
	[2024-09-30 00:28:11,497][1150061] Stopping Batcher_0...
	[2024-09-30 00:28:11,497][1149865] Component RolloutWorker_w0 process died already! Don't wait for it.
	[2024-09-30 00:28:11,497][1150061] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-09-30 00:28:11,498][1150061] Loop batcher_evt_loop terminating...
	[2024-09-30 00:28:11,513][1150139] Weights refcount: 2 0
	[2024-09-30 00:28:11,514][1150139] Stopping InferenceWorker_p0-w0...
	[2024-09-30 00:28:11,514][1150139] Loop inference_proc0-0_evt_loop terminating...
	[2024-09-30 00:28:11,514][1149865] Component InferenceWorker_p0-w0 stopped!
	[2024-09-30 00:28:11,527][1150138] Stopping RolloutWorker_w2...
	[2024-09-30 00:28:11,527][1149865] Component RolloutWorker_w2 stopped!
	[2024-09-30 00:28:11,528][1150138] Loop rollout_proc2_evt_loop terminating...
	[2024-09-30 00:28:11,530][1150142] Stopping RolloutWorker_w3...
	[2024-09-30 00:28:11,530][1149865] Component RolloutWorker_w3 stopped!
	[2024-09-30 00:28:11,530][1150142] Loop rollout_proc3_evt_loop terminating...
	[2024-09-30 00:28:11,531][1149865] Component RolloutWorker_w5 stopped!
	[2024-09-30 00:28:11,531][1150137] Stopping RolloutWorker_w5...
	[2024-09-30 00:28:11,531][1149865] Component RolloutWorker_w6 stopped!
	[2024-09-30 00:28:11,531][1150145] Stopping RolloutWorker_w6...
	[2024-09-30 00:28:11,531][1150137] Loop rollout_proc5_evt_loop terminating...
	[2024-09-30 00:28:11,531][1150145] Loop rollout_proc6_evt_loop terminating...
	[2024-09-30 00:28:11,532][1149865] Component RolloutWorker_w1 stopped!
	[2024-09-30 00:28:11,532][1150143] Stopping RolloutWorker_w1...
	[2024-09-30 00:28:11,533][1150143] Loop rollout_proc1_evt_loop terminating...
	[2024-09-30 00:28:11,533][1149865] Component RolloutWorker_w4 stopped!
	[2024-09-30 00:28:11,533][1150141] Stopping RolloutWorker_w4...
	[2024-09-30 00:28:11,533][1150141] Loop rollout_proc4_evt_loop terminating...
	[2024-09-30 00:28:11,536][1149865] Component RolloutWorker_w7 stopped!
	[2024-09-30 00:28:11,536][1150140] Stopping RolloutWorker_w7...
	[2024-09-30 00:28:11,536][1150140] Loop rollout_proc7_evt_loop terminating...
	[2024-09-30 00:28:11,548][1150061] Saving /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-09-30 00:28:11,677][1150061] Stopping LearnerWorker_p0...
	[2024-09-30 00:28:11,677][1150061] Loop learner_proc0_evt_loop terminating...
	[2024-09-30 00:28:11,677][1149865] Component LearnerWorker_p0 stopped!
	[2024-09-30 00:28:11,678][1149865] Waiting for process learner_proc0 to stop...
	[2024-09-30 00:28:12,213][1149865] Waiting for process inference_proc0-0 to join...
	[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc0 to join...
	[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc1 to join...
	[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc2 to join...
	[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc3 to join...
	[2024-09-30 00:28:12,214][1149865] Waiting for process rollout_proc4 to join...
	[2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc5 to join...
	[2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc6 to join...
	[2024-09-30 00:28:12,215][1149865] Waiting for process rollout_proc7 to join...
	[2024-09-30 00:28:12,215][1149865] Batcher 0 profile tree view:
	batching: 8.1702, releasing_batches: 0.0148
	[2024-09-30 00:28:12,215][1149865] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 2.2430
	update_model: 1.6718
	weight_update: 0.0006
	one_step: 0.0013
	handle_policy_step: 101.6105
	deserialize: 4.2527, stack: 0.5251, obs_to_device_normalize: 21.3149, forward: 52.1177, send_messages: 6.7725
	prepare_outputs: 11.8901
	to_cpu: 6.4354
	[2024-09-30 00:28:12,216][1149865] Learner 0 profile tree view:
	misc: 0.0031, prepare_batch: 4.0428
	train: 10.3860
	epoch_init: 0.0033, minibatch_init: 0.0037, losses_postprocess: 0.1662, kl_divergence: 0.2113, after_optimizer: 0.8304
	calculate_losses: 4.6270
	losses_init: 0.0020, forward_head: 0.3762, bptt_initial: 2.3802, tail: 0.3318, advantages_returns: 0.0873, losses: 0.6229
	bptt: 0.7209
	bptt_forward_core: 0.6909
	update: 4.3204
	clip: 0.4495
	[2024-09-30 00:28:12,216][1149865] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.0819, enqueue_policy_requests: 4.5546, env_step: 67.0989, overhead: 3.2408, complete_rollouts: 0.1226
	save_policy_outputs: 5.6070
	split_output_tensors: 1.8787
	[2024-09-30 00:28:12,216][1149865] Loop Runner_EvtLoop terminating...
	[2024-09-30 00:28:12,216][1149865] Runner profile tree view:
	main_loop: 115.9303
	[2024-09-30 00:28:12,216][1149865] Collected {0: 4005888}, FPS: 34554.3
	[2024-09-30 00:28:12,419][1149865] Loading existing experiment configuration from /home/luyang/workspace/rl/train_dir/default_experiment/config.json
	[2024-09-30 00:28:12,419][1149865] Overriding arg 'num_workers' with value 1 passed from command line
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'no_render'=True that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'save_video'=True that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'video_name'=None that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'hf_repository'='esperesa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'train_script'=None that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2024-09-30 00:28:12,420][1149865] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2024-09-30 00:28:12,441][1149865] Doom resolution: 160x120, resize resolution: (128, 72)
	[2024-09-30 00:28:12,443][1149865] RunningMeanStd input shape: (3, 72, 128)
	[2024-09-30 00:28:12,443][1149865] RunningMeanStd input shape: (1,)
	[2024-09-30 00:28:12,452][1149865] ConvEncoder: input_channels=3
	[2024-09-30 00:28:12,522][1149865] Conv encoder output size: 512
	[2024-09-30 00:28:12,522][1149865] Policy head output size: 512
	[2024-09-30 00:28:12,681][1149865] Loading state from checkpoint /home/luyang/workspace/rl/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2024-09-30 00:28:13,271][1149865] Num frames 100...
	[2024-09-30 00:28:13,350][1149865] Num frames 200...
	[2024-09-30 00:28:13,427][1149865] Num frames 300...
	[2024-09-30 00:28:13,503][1149865] Num frames 400...
	[2024-09-30 00:28:13,581][1149865] Num frames 500...
	[2024-09-30 00:28:13,659][1149865] Num frames 600...
	[2024-09-30 00:28:13,738][1149865] Num frames 700...
	[2024-09-30 00:28:13,815][1149865] Num frames 800...
	[2024-09-30 00:28:13,894][1149865] Num frames 900...
	[2024-09-30 00:28:13,973][1149865] Num frames 1000...
	[2024-09-30 00:28:14,052][1149865] Num frames 1100...
	[2024-09-30 00:28:14,132][1149865] Num frames 1200...
	[2024-09-30 00:28:14,208][1149865] Num frames 1300...
	[2024-09-30 00:28:14,285][1149865] Num frames 1400...
	[2024-09-30 00:28:14,364][1149865] Num frames 1500...
	[2024-09-30 00:28:14,445][1149865] Avg episode rewards: #0: 34.360, true rewards: #0: 15.360
	[2024-09-30 00:28:14,445][1149865] Avg episode reward: 34.360, avg true_objective: 15.360
	[2024-09-30 00:28:14,497][1149865] Num frames 1600...
	[2024-09-30 00:28:14,575][1149865] Num frames 1700...
	[2024-09-30 00:28:14,654][1149865] Num frames 1800...
	[2024-09-30 00:28:14,733][1149865] Num frames 1900...
	[2024-09-30 00:28:14,811][1149865] Num frames 2000...
	[2024-09-30 00:28:14,891][1149865] Num frames 2100...
	[2024-09-30 00:28:14,967][1149865] Num frames 2200...
	[2024-09-30 00:28:15,045][1149865] Num frames 2300...
	[2024-09-30 00:28:15,125][1149865] Num frames 2400...
	[2024-09-30 00:28:15,204][1149865] Num frames 2500...
	[2024-09-30 00:28:15,283][1149865] Num frames 2600...
	[2024-09-30 00:28:15,363][1149865] Num frames 2700...
	[2024-09-30 00:28:15,443][1149865] Num frames 2800...
	[2024-09-30 00:28:15,523][1149865] Num frames 2900...
	[2024-09-30 00:28:15,602][1149865] Num frames 3000...
	[2024-09-30 00:28:15,681][1149865] Num frames 3100...
	[2024-09-30 00:28:15,758][1149865] Num frames 3200...
	[2024-09-30 00:28:15,834][1149865] Num frames 3300...
	[2024-09-30 00:28:15,958][1149865] Avg episode rewards: #0: 40.959, true rewards: #0: 16.960
	[2024-09-30 00:28:15,958][1149865] Avg episode reward: 40.959, avg true_objective: 16.960
	[2024-09-30 00:28:15,966][1149865] Num frames 3400...
	[2024-09-30 00:28:16,049][1149865] Num frames 3500...
	[2024-09-30 00:28:16,127][1149865] Num frames 3600...
	[2024-09-30 00:28:16,205][1149865] Num frames 3700...
	[2024-09-30 00:28:16,284][1149865] Num frames 3800...
	[2024-09-30 00:28:16,363][1149865] Num frames 3900...
	[2024-09-30 00:28:16,443][1149865] Num frames 4000...
	[2024-09-30 00:28:16,521][1149865] Num frames 4100...
	[2024-09-30 00:28:16,598][1149865] Num frames 4200...
	[2024-09-30 00:28:16,675][1149865] Num frames 4300...
	[2024-09-30 00:28:16,754][1149865] Num frames 4400...
	[2024-09-30 00:28:16,832][1149865] Num frames 4500...
	[2024-09-30 00:28:16,919][1149865] Avg episode rewards: #0: 36.480, true rewards: #0: 15.147
	[2024-09-30 00:28:16,919][1149865] Avg episode reward: 36.480, avg true_objective: 15.147
	[2024-09-30 00:28:16,968][1149865] Num frames 4600...
	[2024-09-30 00:28:17,047][1149865] Num frames 4700...
	[2024-09-30 00:28:17,126][1149865] Num frames 4800...
	[2024-09-30 00:28:17,206][1149865] Num frames 4900...
	[2024-09-30 00:28:17,282][1149865] Num frames 5000...
	[2024-09-30 00:28:17,358][1149865] Num frames 5100...
	[2024-09-30 00:28:17,470][1149865] Avg episode rewards: #0: 31.192, true rewards: #0: 12.942
	[2024-09-30 00:28:17,471][1149865] Avg episode reward: 31.192, avg true_objective: 12.942
	[2024-09-30 00:28:17,490][1149865] Num frames 5200...
	[2024-09-30 00:28:17,569][1149865] Num frames 5300...
	[2024-09-30 00:28:17,649][1149865] Num frames 5400...
	[2024-09-30 00:28:17,729][1149865] Num frames 5500...
	[2024-09-30 00:28:17,807][1149865] Num frames 5600...
	[2024-09-30 00:28:17,887][1149865] Num frames 5700...
	[2024-09-30 00:28:17,966][1149865] Num frames 5800...
	[2024-09-30 00:28:18,043][1149865] Num frames 5900...
	[2024-09-30 00:28:18,120][1149865] Num frames 6000...
	[2024-09-30 00:28:18,196][1149865] Num frames 6100...
	[2024-09-30 00:28:18,274][1149865] Num frames 6200...
	[2024-09-30 00:28:18,352][1149865] Num frames 6300...
	[2024-09-30 00:28:18,477][1149865] Avg episode rewards: #0: 30.786, true rewards: #0: 12.786
	[2024-09-30 00:28:18,477][1149865] Avg episode reward: 30.786, avg true_objective: 12.786
	[2024-09-30 00:28:18,484][1149865] Num frames 6400...
	[2024-09-30 00:28:18,564][1149865] Num frames 6500...
	[2024-09-30 00:28:18,643][1149865] Num frames 6600...
	[2024-09-30 00:28:18,723][1149865] Num frames 6700...
	[2024-09-30 00:28:18,803][1149865] Num frames 6800...
	[2024-09-30 00:28:18,879][1149865] Num frames 6900...
	[2024-09-30 00:28:18,957][1149865] Num frames 7000...
	[2024-09-30 00:28:19,033][1149865] Num frames 7100...
	[2024-09-30 00:28:19,110][1149865] Num frames 7200...
	[2024-09-30 00:28:19,190][1149865] Num frames 7300...
	[2024-09-30 00:28:19,271][1149865] Num frames 7400...
	[2024-09-30 00:28:19,353][1149865] Num frames 7500...
	[2024-09-30 00:28:19,443][1149865] Num frames 7600...
	[2024-09-30 00:28:19,538][1149865] Num frames 7700...
	[2024-09-30 00:28:19,630][1149865] Num frames 7800...
	[2024-09-30 00:28:19,721][1149865] Num frames 7900...
	[2024-09-30 00:28:19,817][1149865] Num frames 8000...
	[2024-09-30 00:28:19,910][1149865] Num frames 8100...
	[2024-09-30 00:28:20,014][1149865] Avg episode rewards: #0: 33.588, true rewards: #0: 13.588
	[2024-09-30 00:28:20,015][1149865] Avg episode reward: 33.588, avg true_objective: 13.588
	[2024-09-30 00:28:20,062][1149865] Num frames 8200...
	[2024-09-30 00:28:20,156][1149865] Num frames 8300...
	[2024-09-30 00:28:20,246][1149865] Num frames 8400...
	[2024-09-30 00:28:20,341][1149865] Num frames 8500...
	[2024-09-30 00:28:20,433][1149865] Num frames 8600...
	[2024-09-30 00:28:20,523][1149865] Num frames 8700...
	[2024-09-30 00:28:20,616][1149865] Num frames 8800...
	[2024-09-30 00:28:20,707][1149865] Num frames 8900...
	[2024-09-30 00:28:20,800][1149865] Num frames 9000...
	[2024-09-30 00:28:20,892][1149865] Num frames 9100...
	[2024-09-30 00:28:20,986][1149865] Num frames 9200...
	[2024-09-30 00:28:21,079][1149865] Num frames 9300...
	[2024-09-30 00:28:21,172][1149865] Num frames 9400...
	[2024-09-30 00:28:21,264][1149865] Num frames 9500...
	[2024-09-30 00:28:21,356][1149865] Num frames 9600...
	[2024-09-30 00:28:21,451][1149865] Num frames 9700...
	[2024-09-30 00:28:21,526][1149865] Avg episode rewards: #0: 34.030, true rewards: #0: 13.887
	[2024-09-30 00:28:21,526][1149865] Avg episode reward: 34.030, avg true_objective: 13.887
	[2024-09-30 00:28:21,591][1149865] Num frames 9800...
	[2024-09-30 00:28:21,672][1149865] Num frames 9900...
	[2024-09-30 00:28:21,757][1149865] Num frames 10000...
	[2024-09-30 00:28:21,850][1149865] Num frames 10100...
	[2024-09-30 00:28:21,945][1149865] Num frames 10200...
	[2024-09-30 00:28:22,035][1149865] Num frames 10300...
	[2024-09-30 00:28:22,128][1149865] Num frames 10400...
	[2024-09-30 00:28:22,220][1149865] Num frames 10500...
	[2024-09-30 00:28:22,312][1149865] Num frames 10600...
	[2024-09-30 00:28:22,393][1149865] Num frames 10700...
	[2024-09-30 00:28:22,474][1149865] Num frames 10800...
	[2024-09-30 00:28:22,537][1149865] Avg episode rewards: #0: 32.886, true rewards: #0: 13.511
	[2024-09-30 00:28:22,537][1149865] Avg episode reward: 32.886, avg true_objective: 13.511
	[2024-09-30 00:28:22,621][1149865] Num frames 10900...
	[2024-09-30 00:28:22,714][1149865] Num frames 11000...
	[2024-09-30 00:28:22,806][1149865] Num frames 11100...
	[2024-09-30 00:28:22,898][1149865] Num frames 11200...
	[2024-09-30 00:28:22,990][1149865] Num frames 11300...
	[2024-09-30 00:28:23,082][1149865] Num frames 11400...
	[2024-09-30 00:28:23,165][1149865] Num frames 11500...
	[2024-09-30 00:28:23,247][1149865] Num frames 11600...
	[2024-09-30 00:28:23,338][1149865] Num frames 11700...
	[2024-09-30 00:28:23,432][1149865] Num frames 11800...
	[2024-09-30 00:28:23,522][1149865] Num frames 11900...
	[2024-09-30 00:28:23,616][1149865] Num frames 12000...
	[2024-09-30 00:28:23,730][1149865] Num frames 12100...
	[2024-09-30 00:28:23,823][1149865] Num frames 12200...
	[2024-09-30 00:28:23,904][1149865] Num frames 12300...
	[2024-09-30 00:28:23,983][1149865] Num frames 12400...
	[2024-09-30 00:28:24,063][1149865] Num frames 12500...
	[2024-09-30 00:28:24,150][1149865] Num frames 12600...
	[2024-09-30 00:28:24,230][1149865] Num frames 12700...
	[2024-09-30 00:28:24,306][1149865] Num frames 12800...
	[2024-09-30 00:28:24,390][1149865] Avg episode rewards: #0: 35.268, true rewards: #0: 14.268
	[2024-09-30 00:28:24,391][1149865] Avg episode reward: 35.268, avg true_objective: 14.268
	[2024-09-30 00:28:24,438][1149865] Num frames 12900...
	[2024-09-30 00:28:24,516][1149865] Num frames 13000...
	[2024-09-30 00:28:24,594][1149865] Num frames 13100...
	[2024-09-30 00:28:24,675][1149865] Num frames 13200...
	[2024-09-30 00:28:24,755][1149865] Num frames 13300...
	[2024-09-30 00:28:24,833][1149865] Num frames 13400...
	[2024-09-30 00:28:24,946][1149865] Avg episode rewards: #0: 33.076, true rewards: #0: 13.476
	[2024-09-30 00:28:24,946][1149865] Avg episode reward: 33.076, avg true_objective: 13.476
	[2024-09-30 00:28:42,313][1149865] Replay video saved to /home/luyang/workspace/rl/train_dir/default_experiment/replay.mp4!